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(57) Abstract: The present invention provides purified disease detection and treatment molecule polynucleotides (mddt). Also 
encompassed are the polypeptides (MDDT) encoded by mddL The invention also provides for the use of mddt, or complements, 
oligonucleotides, or fragments thereof in diagnostic assays. The invention further provides for vectors and host cells containing mddt 
for the expression of MDDT. The invention additionally provides for the use of isolated and purified MDDT to induce antibodies 
and to screen libraries of compounds and the use of anti-MDDT antibodies in diagnostic assays. Also provided are microarrays 
containing mddt and methods of use. 
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MOLECULES FOR DISEASE DETECTION AND TREATMENT 

TECHNICAL FIELD 

The present invention relates to molecules for disease detection and treatment and to the use 
5 of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with 
disease detection and treatment molecules. 



BACKGROUND OF THE INVENTION 
The human genome is comprised of thousands of genes, many encoding gene products that 
10 function in the maintenance and growth of the various cells and tissues in the body. Aberrant 
expression or mutations in these genes and their products is the cause of, or is associated with, a 
variety of human diseases such as cancer and other cell proliferative disorders. The identification of 
these genes and their products is the basis of an ever-expanding effort to find markers for early 
detection of diseases, and targets for their prevention and treatment. 
15 For example, cancer represents a type of cell proliferative disorder that affects nearly every 

tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the 
cause of, or involved with, various cancers because tissue growth involves complex and ordered 
patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated 
to maintain both the number of cells and their spatial organization. This regulation depends upon the 
20 appropriate expression of proteins which control cell cycle progression in response to extracellular 
signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or 
nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into 
several categories, including growth factors and their receptors, second messenger and signal 
transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. 
25 Aberrant expression or mutations in any of these gene products can result in cell proliferative 

disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through 
abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one 
(oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways 
and include growth factors, growth factor receptors, intracellular signal transducers, nuclear 
30 transcription factors, and cell-cycle control proteins. In contrast, tumor-suppressor genes are 
involved in inhibiting cell proliferation. Mutations which cause reduced or loss of function in 
tumor-suppressor genes result in aberrant cell proliferation and cancer. Thus a wide variety of genes 
and their products have been found that are associated with cell proliferative disorders such as cancer, 
but many more may exist that are yet to be discovered. 
35 DNA-based arrays can provide a simple way to explore the expression of a single 

polymorphic gene or a large number of genes. When the expression of a single gene is explored, 
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DNA-based arrays are employed to detect the expression of specific gene variants. For example, a 
p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that 
predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals 
have one of a number of specific mutations that could result in increased drug metabolism, drug 
5 resistance or drug toxicity. 

DNA-based array technology is especially relevant for the rapid screening of expression of a 
large number of genes. There is a growing awareness that gene expression is affected in a global 
fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, 
the expression of a large number of genes. In some cases the interactions may be expected, such as 

10 when the genes are part of the same signaling pathway. In other cases, such as when the genes 
participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, 
DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic 
treatment affects the expression of a large number of genes. 

The discovery of new molecules for disease detection and treatment satisfies a need in the art 

15 by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of 
diseases. 

SUMMARY OF THE INVENTION 

The present invention relates to human polynucleotides encoding molecules for disease 
20 detection and treatment (mddt) as presented in the Sequence Listing. Some of the mddt uniquely 
identify genes encoding structural, functional, and regulatory molecules for disease detection and 
treatment. 

The invention provides an isolated polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence selected from the group consisting 

25 of SEQ ID NO:l-14; b) a naturally occurring polynucleotide sequence having at least 90% sequence 
identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a 
polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); 
and e) an RNA equivalent of a) through d). In one alternative, the polynucleotide comprises a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14. In another 

30 alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide 

sequence selected from the group consisting of a) a polynucleotide sequence selected from the group 
consisting of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence having at least 90% 
sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1- 
14; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary 

3 5 to b); and e) an RNA equivalent of a) through d). The invention further provides a composition for 
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the detection of expression of disease detection and treatment molecule polynucleotides comprising at 
least one isolated polynucleotide comprising a polynucleotide sequence selected from the group 
consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; b) 
a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
5 polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d); and a detectable label. 

The invention also provides a method for detecting a target polynucleotide in a sample, said 
target polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a 
10 polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; b) a naturally 
occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide 
sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide sequence 
complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent 
of a) through d). The method comprises a) hybridizing the sample with a probe comprising at least 20 
15 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the 
sample, and which probe specifically hybridizes to said target polynucleotide, under conditions 
whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) 
detecting the presence or absence of said hybridization complex, and, optionally, if present, the 
amount thereof. In one alternative, the probe comprises at least 30 contiguous nucleotides. In 
20 another alternative, the probe comprises at least 60 contiguous nucleotides. 

The invention further provides a recombinant polynucleotide comprising a promoter sequence 
operably linked to an isolated polynucleotide comprising a polynucleotide sequence selected from the 
group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 - 
14; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
25 polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 -14; c) a polynucleotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d). In one alternative, the invention provides a cell transformed with the 
recombinant polynucleotide. In another alternative, the invention provides a transgenic organism 
comprising the recombinant polynucleotide. In a further alternative, the invention provides a method 
30 for producing a disease detection and treatment molecule polypeptide, the method comprising a) 
culturing a cell under conditions suitable for expression of the disease detection and treatment 
molecule polypeptide, wherein said cell is transformed with the recombinant polynucleotide, and b) 
recovering the disease detection and treatment molecule polypeptide so expressed. 

The invention also provides a purified disease detection and treatment molecule polypeptide 
3 5 (MDDT) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from 
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the group consisting of SEQ ID NO: 1-14. Additionally, the invention provides an isolated antibody 
which specifically binds to the disease detection and treatment molecule polypeptide. The invention 
further provides a method of identifying a test compound which specifically binds to the disease 
detection and treatment molecule polypeptide, the method comprising the steps of a) providing a test 
5 compound; b) combining the disease detection and treatment molecule polypeptide with the test 
compound for a sufficient time and under suitable conditions for binding; and c) detecting binding of 
the disease detection and treatment molecule polypeptide to the test compound, thereby identifying 
the test compound which specifically binds the disease detection and treatment molecule polypeptide. 
The invention further provides a microarray wherein at least one element of the microarray is 
10 an isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide 
comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide 
sequence selected from the group consisting of SEQ ID NO: 1-14; b) a naturally occurring 
polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected 
from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide sequence complementary to a); d) 
15 a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The 
invention also provides a method of using the microarray for generating a transcript image of a 
sample which contains polynucleotides. The method comprises a) labeling the polynucleotides of the 
sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample 
under conditions suitable for the formation of a hybridization complex, and c) quantifying the 
20 expression of the polynucleotides in the sample. 

Additionally, the invention provides a method for screening a compound for effectiveness in 
altering expression of a target polynucleotide, wherein said target polynucleotide comprises a 
polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected 
from the group consisting of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence 
25 having at least 90% sequence identity to a polynucleotide sequence selected from the group 
consisting of SEQ ID NO:l-14; c) a polynucleotide sequence complementary to a); d) a 
polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The 
method comprises a) exposing a sample comprising the target polynucleotide to a compound, and b) 
detecting altered expression of the target polynucleotide. 
30 The invention further provides a method for detecting a target polynucleotide in a sample for 

toxicity testing of a compound, said target polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence selected from the group consisting 
of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence having at least 90% sequence 
identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a 
35 polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); 
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and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a 
probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said 
target polynucleotide in the sample, and which probe specifically hybridizes to said target 
polynucleotide, under conditions whereby a hybridization complex is formed between said probe and 
5 said target polynucleotide, b) detecting the presence or absence of said hybridization complex, and, 
optionally, if present, the amount thereof, and c) comparing the presence, absence or amount of said 
target polynucleotide in a first biological sample and a second biological sample, wherein said first 
biological sample has been contacted with said compound, and said second sample is a control, 
whereby a change in presence, absence or amount of said target polynucleotide in said first sample, as 
10 compared with said second sample, is indicative of toxic response to said compound. 

DESCRIPTION OF THE TABLES 

Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
15 along with their GenBank hits (GI Numbers), probability scores, and functional annotations 
corresponding to the GenBank hits. 

Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with polynucleotide segments of each template sequence as defined by the indicated "start" and 
20 "stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, 
Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the 
polynucleotide segments are indicated. 

Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
25 along with polynucleotide segments of each template sequence as defined by the indicated "start" and 
"stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the 
polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or 
transmembrane (TM) domains, as indicated. 

Table 4 shows the sequence identification numbers (SEQ ID NO:s) and template 
30 identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with component sequence identification numbers (component IDs) corresponding to each 
template. The component sequences, which were used to assemble the template sequences, are 
defined by the indicated "start" and "stop" nucleotide positions along each template. 

Table 5 summarizes the bioinformatics tools which are useful for analysis of the 
35 polynucleotides of the present invention. The first column of Table 5 lists analytical tools, programs, 
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and algorithms, the second column provides brief descriptions thereof, the third column presents 
appropriate references, all of which are incorporated by reference herein in their entirety, and the 
fourth column presents, where applicable, the scores, probability values, and other parameters used to 
evaluate the strength of a match between two sequences (the higher the score, the greater the 
5 homology between two sequences). 

DETAILED DESCRIPTION OF THE INVENTION 

Before the nucleic acid sequences and methods are presented, it is to be understood that this 
invention is not limited to the particular machines, methods, and materials described. Although 
10 particular embodiments are described, machines, methods, and materials similar or equivalent to these 
embodiments may be used to practice the invention. The preferred machines, methods, and materials 
set forth are not intended to limit the scope of the invention which is limited only by the appended 
claims. 

The singular forms "a", "an", and "the" include plural reference unless the context clearly 
15 dictates otherwise. All technical and scientific terms have the meanings commonly understood by 
one of ordinary skill in the art. All publications are incorporated by reference for the purpose of 
describing and disclosing the cell lines, vectors, and methodologies which are presented and which 
might be used in connection with the invention. Nothing in the specification is to be construed as an 
admission that the invention is not entitled to antedate such disclosure by virtue of prior invention. 

20 

Definitions 

As used herein, the lower case "mddt" refers to a nucleic acid sequence, while the upper case 
"MDDT" refers to an amino acid sequence encoded by mddt. A "full-length" mddt refers to a nucleic 
acid sequence containing the entire coding region of a gene endogenously expressed in human tissue. 

25 "Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and 

surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole 
limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's 
immunological response. 

"Allele" refers to an alternative form of a nucleic acid sequence. Alleles result from a 

30 "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, 
one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or 
substitutions of nucleotides. Each of these changes may occur alone, or in combination with the 
others, one or more times in a given nucleic acid sequence. The present invention encompasses 
allelic mddt. 

35 "Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or 
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synthetic origin. The amino acid sequence is not limited to the complete, endogenous amino acid 
sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic 
acid sequence. 

"Amplification" refers to the production of additional copies of a sequence and is carried out 
5 using polymerase chain reaction (PCR) technologies well known in the art. 

"Antibody" refers to intact molecules as well as to fragments thereof, such as Fab, F(ab') 2 , 
and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind 
MDDT polypeptides can be prepared using intact polypeptides or using fragments containing small 
peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an 
10 animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized 
chemically, and can be conjugated to a carrier protein if desired. Commonly used carriers that are 
chemically coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet 
hemocyanin (KLH). The coupled peptide is then used to immunize the animal. 

"Antisense sequence" refers to a sequence capable of specifically hybridizing to a target 
15 sequence. The antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such 
as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as 
phosphorothioates, methylphosphonates, or benzylphosphonates; oligonucleotides having modified 
sugar groups such as 2-methoxyethyl sugars or 2-methoxyethoxy sugars; or oligonucleotides having 
modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine. 
20 "Antisense sequence" refers to a sequence capable of specifically hybridizing to a target 

sequence. The antisense sequence can be DNA, RNA, or any nucleic acid mimic or analog. 

"Antisense technology" refers to any technology which relies on the specific hybridization of 
an antisense sequence to a target sequence. 

A "bin" is a portion of computer memory space used by a computer program for storage of 
25 data, and bounded in such a manner that data stored in a bin may be retrieved by the program. 

"Biologically active" refers to an amino acid sequence having a structural, regulatory, or 
biochemical function of a naturally occurring amino acid sequence. 

"Clone joining" is a process for combining gene bins based upon the bins' containing 
sequence information from the same clone. The sequences may assemble into a primary gene 
30 transcript as well as one or more splice variants. 

"Complementary" describes the relationship between two single-stranded nucleic acid 
sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3-T-C-A-5'). 

A "component sequence" is a nucleic acid sequence selected by a computer program such as 
PHRED and used to assemble a consensus or template sequence from one or more component 
35 sequences. 
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A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been 
assembled from overlapping sequences, using a computer program for fragment assembly such as the 
GEL VIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a 
relational database management system (RDMS). 
5 "Conservative amino acid substitutions" are those substitutions that, when made, least 

interfere with the properties of the original protein, i.e., the structure and especially the function of 
the protein is conserved and not significantly changed by such substitutions. The table below shows 
amino acids which may be substituted for an original amino acid in a protein and which are regarded 
as conservative substitutions. 
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Original Residue 


Conservative Substitution 




Ala 


Gly, Ser 




Arg 


His, Lys 




Asn 


Asp, Gin, His 


15 


Asp 


Asn, Glu 




Cys 


Ala, Ser 




Gin 


Asn, Glu, His 




Glu 


Asp, Gin, His 




Gly 


Ala 


20 


His 


Asn, Arg, Gin, Glu 




lie 


Leu, Val 




Leu 


He, Val 




Lys 


Arg, Gin, Glu 




Met 


Leu, lie 


25 


Phe 


His, Met, Leu, Trp, Tyr 




Ser 


Cys, Thr 




Thr 


Ser, Val 




Trp 


Phe, Tyr 




Tyr 


His, Phe, Trp 


30 


Val 


He, Leu, Thr 



Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in 
the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge 
35 or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain. 

"Deletion" refers to a change in either a nucleic or amino acid sequence in which at least one 
nucleotide or amino acid residue, respectively, is absent. 

"Derivative" refers to the chemical modification of a nucleic acid sequence, such as by 
replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group. 
40 The terms "element" and "array element" refer to a polynucleotide, polypeptide, or other 

chemical compound having a unique and defined position on a microarray. 
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"E-value" refers to the statistical probability that a match between two sequences occurred by 

chance. 

A "fragment" is a unique portion of mddt or MDDT which is identical in sequence to but 
shorter in length than the parent sequence. A fragment may comprise up to the entire length of the 
5 defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise 
from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, primer, 
antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 
60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. 
Fragments may be preferentially selected from certain regions of a molecule. For example, a 
10 polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 
250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined 
sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, 
including the Sequence Listing and the figures, may be encompassed by the present embodiments. 
A fragment of mddt comprises a region of unique polynucleotide sequence that specifically 
15 identifies mddt, for example, as distinct from any other sequence in the same genome. A fragment of 
mddt is useful, for example, in hybridization and amplification technologies and in analogous 
methods that distinguish mddt from related polynucleotide sequences. The precise length of a 
fragment of mddt and the region of mddt to which the fragment corresponds are routinely 
determinable by one of ordinary skill in the art based on the intended purpose for the fragment. 
20 A fragment of MDDT is encoded by a fragment of mddt. A fragment of MDDT comprises a 

region of unique amino acid sequence that specifically identifies MDDT. For example, a fragment of 
MDDT is useful as an immunogenic peptide for the development of antibodies that specifically 
recognize MDDT. The precise length of a fragment of MDDT and the region of MDDT to which the 
fragment corresponds are routinely determinable by one of ordinary skill in the art based on the 
2 5 intended purpose for the fragment. 

A "full length" nucleotide sequence is one containing at least a start site for translation to a 
protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" 
polypeptide. 

"Hit" refers to a sequence whose annotation will be used to describe a given template. 
30 Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid 
matches, the top hit is the exact match with highest percent identity. If the template has no exact 
matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the 
template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit 
is the nucleotide hit with the lowest E-value. 
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"Homology" refers to sequence similarity either between a reference nucleic acid sequence 
and at least a fragment of an mddt or between a reference amino acid sequence and a fragment of an 
MDDT. 

"Hybridization" refers to the process by which a strand of nucleotides anneals with a 
complementary strand through base pairing. Specific hybridization is an indication that two nucleic 
acid sequences share a high degree of identity. Specific hybridization complexes form under defined 
annealing conditions, and remain hybridized after the "washing" step. The defined hybridization 
conditions include the annealing conditions and the washing step(s), the latter of which is particularly 
important in determining the stringency of the hybridization process, with more stringent conditions 
allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not 
perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely 
determinable and may be consistent among hybridization experiments, whereas wash conditions may 
be varied among experiments to achieve the desired stringency. 

Generally, stringency of hybridization is expressed with reference to the temperature under 
which the wash step is carried out. Generally, such wash temperatures are selected to be about 5°C to 
20°C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength 
and pH. The T m is the temperature (under defined ionic strength and pH) at which 50% of the target 
sequence hybridizes to a perfectly matched probe. An equation for calculating T m and conditions for 
nucleic acid hybridization is well known and can be found in Sambrook et a!., 1989, Molecular 
Cloning: A Laboratory Manual . 2 nd ed., vol. 1-3, Cold Spring Harbor Press, Plain view NY; 
specifically see volume 2, chapter 9. 

High stringency conditions for hybridization between polynucleotides of the present 
invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0. 1 % SDS, 
for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC 
concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%. 
Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents 
include, for instance, denatured salmon sperm DNA at about 100-200 ug/ml. Useful variations on 
these conditions will be readily apparent to those skilled in the art. Hybridization, particularly under 
high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. 
Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins. 

Other parameters, such as temperature, salt concentration, and detergent concentration may 
be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of 
about 35-50% v/v, may also be used under particular circumstances, such as RNA:DNA 
hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary 
skill in the art. 
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"Immunogenic" describes the potential for a natural, recombinant, or synthetic peptide, 
epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or cell 
lines. 

"Insertion' 1 or "addition" refers to a change in either a nucleic or amino acid sequence in 
5 which at least one nucleotide or residue, respectively, is added to the sequence. 

"Labeling" refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or 
antibody with a reporter molecule capable of producing a detectable or measurable signal. 

"Microarray" is any arrangement of nucleic acids, amino acids, antibodies, etc., on a 
substrate. The substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or 
10 an appropriate membrane. 

"Linkers" are short stretches of nucleotide sequence which may be added to a vector or an 
mddt to create restriction endonuclease sites to facilitate cloning. "Polylinkers" are engineered to 
incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5* or 
3' overhangs (e.g., BamHL EcoRI, and Hindm) and those which provide blunt ends (e.g., EcoRV, 
15 SnaBI, and StuI). 

"Naturally occurring" refers to an endogenous polynucleotide or polypeptide that may be 
isolated from viruses or prokaryotic or eukaryotic cells. 

"Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester 
bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid 
20 sequence can be considered an oligomer, oligonucleotide, or polynucleotide. The nucleic acid can be 
DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be 
either double-stranded or single-stranded, and can represent either the sense or antisense 
(complementary) strand. 

"Oligomer" refers to a nucleic acid sequence of at least about 6 nucleotides and as many as 
25 about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 
and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may 
be used as, e.g., primers for PCR, and are usually chemically synthesized. 

"Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a 
functional relationship with the second nucleic acid sequence. For instance, a promoter is operably 
30 linked to a coding sequence if the promoter affects the transcription or expression of the coding 

sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, 
where necessary to join two protein coding regions, in the same reading frame. 

"Peptide nucleic acid" (PNA) refers to a DNA mimic in which nucleotide bases are attached 
to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can 
3 5 prevent gene expression by targeting complementary messenger RNA. 
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The phrases "percent identity" and "% identity", as applied to polynucleotide sequences, 
refer to the percentage of residue matches between at least two polynucleotide sequences aligned 
using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible 
way, gaps in the sequences being compared in order to optimize alignment between two sequences, 
5 and therefore achieve a more meaningful comparison of the two sequences. 

Percent identity between polynucleotide sequences may be determined using the default 
parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e 
sequence alignment program. This program is part of the LASERGENE software package, a suite of 
molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in 
10 Higgins, D.G. and Sharp, P.M. (1989) CABIOS 5:151-153 and in Higgins, D.G. et al. (1992) 

CABIOS 8:189-191. For pairwise alignments of polynucleotide sequences, the default parameters are 
set as follows: Ktuple=2, gap penalty=5, window=4, and "diagonals saved' =4. The "weighted" 
residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the 
"percent similarity" between aligned polynucleotide sequence pairs. 
15 Alternatively, a suite of commonly used and freely available sequence comparison algorithms 

is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment 
Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available 
from several sources, including the NCBI, Bethesda, MD, and on the Internet at 
http://www.ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence 
20 analysis programs including "blastn," that is used to determine alignment between a known 

polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called 
"BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. 
"BLAST 2 Sequences" can be accessed and used interactively at 

http://www.ncbi.nIm.nih.gov/gorf/bl2/. The "BLAST 2 Sequences" tool can be used for both blastn 
25 and blastp (discussed below). BLAST programs are commonly used with gap and other parameters 
set to default settings. For example, to compare two nucleotide sequences, one may use blastn with 
the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default 
parameters may be, for example: 
Matrix: BLOSUM62 
3 0 Reward for match: J 

Penalty for mismatch: -2 
Open Gap: 5 and Extension Gap: 2 penalties 
Gap x drop-off: 50 
Expect: JO 
35 Word Size: 11 
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Filter: on 

Percent identity may be measured over the length of an entire defined sequence, for example, 
as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, 
over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at 
least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous 
nucleotides. Such lengths are exemplary only, and it is understood that any fragment length 
supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a 
length over which percentage identity may be measured. 

Nucleic acid sequences that do not show a high degree of identity may nevertheless encode 
similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes 
in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid 
sequences that all encode substantially the same protein. 

The phrases "percent identity" and "% identity", as applied to polypeptide sequences, refer to 
the percentage of residue matches between at least two polypeptide sequences aligned using a 
standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some 
alignment methods take into account conservative amino acid substitutions. Such conservative 
substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the 
substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide. 

Percent identity between polypeptide sequences may be determined using the default 
parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e 
sequence alignment program (described and referenced above). For pairwise alignments of 
polypeptide sequences using CLUSTAL V, the default parameters are set as follows: Ktuple= 1 , gap 
penalty=3, window=5, and "diagonals saved' =5. The PAM250 matrix is selected as the default 
residue weight table. As with polynucleotide alignments, the percent identity is reported by 
CLUSTAL V as the "percent similarity" between aligned polypeptide sequence pairs. 

Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise 
comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 
(May-07-1999) with blastp set at default parameters. Such default parameters may be, for example: 
Matrix: BLOSUM62 

Open Gap: II and Extension Gap: I penalty 
Gap x drop-off: 50 
Expect: 10 
Word Size: 3 
Filter: on 
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Percent identity may be measured over the length of an entire defined polypeptide sequence, 
for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for 
example, over the length of a fragment taken from a larger, defined polypeptide sequence, for 
instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 
5 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment 
length supported by the sequences shown herein, in figures or Sequence Listings, may be used to 
describe a length over which percentage identity may be measured. 

"Post-translational modification" of an MDDT may involve lipidation, glycosylation, 
phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in 
10 the art. These processes may occur synthetically or biochemically. Biochemical modifications will 
vary by cell type depending on the enzymatic milieu and the MDDT. 

"Probe" refers to mddt or fragments thereof, which are used to detect identical, allelic or 
related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a 
detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, 
15 chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA 

oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. 
The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. 
Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the 
polymerase chain reaction (PCR). 
20 Probes and primers as used in the present invention typically comprise at least 15 contiguous 

nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also 
be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or 
at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may 
be considerably longer than these examples, and it is understood that any length supported by the 
25 specification, including the figures and Sequence Listing, may be used. 

Methods for preparing and using probes and primers are described in the references, for 
example Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual . 2 nd ed., vol. 1-3, Cold 
Spring Harbor Press, Plain view NY; Ausubel et al.,1987, Current Protocols in Molecular Bioloev . 
Greene Publ. Assoc. & Wiley-lntersciences, New York NY; Innis et al., 1990, PCR Protocols. A 
30 Guide to Methods and Applications . Academic Press, San Diego CA. PCR primer pairs can be 

derived from a known sequence, for example, by using computer programs intended for that purpose 
such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA). 

Oligonucleotides for use as primers are selected using software known in the art for such 
purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 
35 100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 
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5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer 
selection programs have incorporated additional features for expanded capabilities. For example, the 
PrimOU primer selection program (available to the public from the Genome Center at University of 
Texas South West Medical Center, Dallas TX) is capable of choosing specific primers from 
5 megabase sequences and is thus useful for designing primers on a genome-wide scope. The Primer3 
primer selection program (available to the public from the Whitehead Institute/MIT Center for 
Genome Research, Cambridge MA) allows the user to input a ''mispriming library," in which 
sequences to avoid as primer binding sites are user- specified. Primer3 is useful, in particular, for the 
selection of oligonucleotides for microarrays. (The source code for the latter two primer selection 
10 programs may also be obtained from their respective sources and modified to meet the user's specific 
needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping 
Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, 
thereby allowing selection of primers that hybridize to either the most conserved or least conserved 
regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both 
15 unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and 

polynucleotide fragments identified by any of the above selection methods are useful in hybridization 
technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to 
identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of 
oligonucleotide selection are not limited to those described above. 
20 "Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or 

separated from their natural environment and are at least 60% free, preferably at least 75% free, and 
most preferably at least 90% free from other compounds with which they are naturally associated. 

A "recombinant nucleic acid" is a sequence that is not naturally occurring or has a sequence 
that is made by an artificial combination of two or more otherwise separated segments of sequence. 
25 This artificial combination is often accomplished by chemical synthesis or, more commonly, by the 
artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques 
such as those described in Sambrook, supra . The term recombinant includes nucleic acids that have 
been altered solely by addition, substitution, or deletion of a portion of the nucleic acid. Frequently, a 
recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter 
3 0 sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to 
transform a cell. 

Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a 
vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is 
expressed, inducing a protective immunological response in the mammal. 
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"Regulatory element" refers to a nucleic acid sequence from nontranslated regions of a gene, 
and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host 
proteins to carry out or regulate transcription or translation. 

"Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, 
5 an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemi luminescent, 
or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known 
in the art. 

An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear 
sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the 

10 nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose 
instead of deoxyribose. 

"Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, 
antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but 
not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a 

15 cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or 
blots or imprints from such cells or tissues). 

"Specific binding" or "specifically binding" refers to the interaction between a protein or 
peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent 
upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, 

20 recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the 
presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction 
containing free labeled A and the antibody will reduce the amount of labeled A that binds to the 
antibody. 

"Substitution" refers to the replacement of at least one nucleotide or amino acid by a different 
25 nucleotide or amino acid. 

"Substrate" refers to any suitable rigid or semi-rigid support including, e.g., membranes, 
filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, 
microparticles or capillaries. The substrate can have a variety of surface forms, such as wells, 
trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound. 
30 A "transcript image" refers to the collective pattern of gene expression by a particular tissue 

or cell type under given conditions at a given time. 

"Transformation" refers to a process by which exogenous DNA enters a recipient cell. 
Transformation may occur under natural or artificial conditions using various methods well known in 
the art. Transformation may rely on any known method for the insertion of foreign nucleic acid 



16 



WO 00/75298 



PCT/US00/15344 



sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell 
being transformed. 

"Transformants" include stably transformed cells in which the inserted DNA is capable of 
replication either as an autonomously replicating plasmid or as part of the host chromosome, as well 
5 as cells which transiently express inserted DNA or RNA. 

A "transgenic organism/' as used herein, is any organism, including but not limited to animals 
and plants, in which one or more of the cells of the organism contains heterologous nucleic acid 
introduced by way of human intervention, such as by transgenic techniques well known in the art. 
The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of 
10 the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a 
recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in 
vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The 
transgenic organisms contemplated in accordance with the present invention include bacteria, 
cyanobacteria, fungi, and plants and animals. The isolated DNA of the present invention can be 
15 introduced into the host by methods known in the art, for example infection, transfection, 

transformation or transconjugation. Techniques for transferring the DNA of the present invention 
into such organisms are widely known and provided in references such as Sambrook et al. (1989), 
supra . 

A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having 
20 at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of 
the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 
1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%, at 
least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or even at least 98% or 
greater sequence identity over a certain defined length. The variant may result in "conservative" 
25 amino acid changes which do not affect structural and/or chemical properties. A variant may be 
described as, for example, an "allelic" (as defined above), "splice," "species," or "polymorphic" 
variant. A splice variant may have significant identity to a reference molecule, but will generally 
have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA 
processing. The corresponding polypeptide may possess additional functional domains or lack 
30 domains that are present in the reference molecule. Species variants are polynucleotide sequences 
that vary from one species to another. The resulting polypeptides generally will have significant 
amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide 
sequence of a particular gene between individuals of a given species. Polymorphic variants also may 
encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies 
35 by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease 
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state, or a propensity for a disease state. 

In an alternative, variants of the polynucleotides of the present invention may be generated 
through recombinant methods. One possible method is a DNA shuffling technique such as 
MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number 
5,837,458; Chang, C.-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C. et al. (1999) Nat. 
Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or 
improve the biological properties of MDDT, such as its biological or enzymatic activity or its ability 
to bind to other molecules or compounds. DNA shuffling is a process by which a library of gene 
variants is produced using PCR-mediated recombination of gene fragments. The library is then 
subjected to selection or screening procedures that identify those gene variants with the desired 
properties. These preferred variants may then be pooled and further subjected to recursive rounds of 
DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" 
breeding and rapid molecular evolution. For example, fragments of a single gene containing random 
point mutations may be recombined, screened, and then reshuffled until the desired properties are 
optimized. Alternatively, fragments of a given gene may be recombined with fragments of 
homologous genes in the same gene family, either from the same or different species, thereby 
maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable 
manner. 

A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having 
at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of 
the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 
1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at 
least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% or greater sequence 
identity over a certain defined length of one of the polypeptides. 

THE INVENTION 

In a particular embodiment, cDNA sequences derived from human tissues and cell lines were 
aligned based on nucleotide sequence identity and assembled into "consensus" or "template" 
sequences which are designated by the template identification numbers (template IDs) in column 2 of 
0 Table 1 . The sequence identification numbers (SEQ ID NO:s) corresponding to the template IDs are 
shown in column 1 . The template sequences have similarity to GenBank sequences, or "hits," as 
designated by the GI Numbers in column 3. The statistical probability of each GenBank hit is 
indicated by a probability score in column 4, and the functional annotation corresponding to each 
GenBank hit is listed in column 5. 

The invention incorporates the nucleic acid sequences of these templates as disclosed in the 
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Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states 
characterized by defects in molecules for disease detection and treatment. The invention further 
utilizes these sequences in hybridization and amplification technologies, and in particular, in 
technologies which assess gene expression patterns correlated with specific cells or tissues and their 
responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, 
the sequences of the present invention are used to develop a transcript image for a particular cell or 
tissue. 

Derivation of Nucleic Acid Sequences 

cDNA was isolated from libraries constructed using RNA derived from normal and diseased 
human tissues and cell lines. The human tissues and cell lines used for cDNA library construction 
were selected from a broad range of sources to provide a diverse population of cDNAs representative 
of gene transcription throughout the human body. Descriptions of the human tissues and cell lines 
used for cDNA library construction are provided in the L1FESEQ database (Incyte Genomics, Inc. 
(Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, 
dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, 
reproductive, and urologic sources. 

Cell lines used for cDNA library construction were derived from, for example, leukemic 
cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. 
Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell 
lines commonly used and available from public depositories (American Type Culture Collection, 
Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical 
agent such as 5 -aza-2 -deoxycytidine, treated with an activating agent such as lipopolysaccharide in 
the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress. 

Sequencing of the cDNAs 

Methods for DNA sequencing are well known in the art. Conventional enzymatic methods 
employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. 
Biochemical Corporation, Cleveland OH), Taq polymerase (PE Biosystems, Foster City CA), 
thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), 
Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found 
in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg 
MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA 
template of interest. Methods have been developed for the use of both single-stranded and double- 
stranded templates. Chain termination reaction products may be electrophoresed on urea- 



19 



WO 00/75298 



PCT/US00/15344 



polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or 
by fluorescence (for fluorophore-labeled nucleotides). Automated methods for mechanized reaction 
preparation, sequencing, and analysis using fluorescence detection methods have been developed. 
Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer 

5 system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, 
Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (PE Biosystems). 
Sequencing can be carried out using, for example, the ABI 373 or 377 (PE Biosystems) or 
MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA 
sequencing systems, or other automated and manual sequencing systems well known in the art. 

10 The nucleotide sequences of the Sequence Listing have been prepared by current, state-of- 

the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified 
nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified 
bases do not represent a hindrance to practicing the invention for those skilled in the art. Several 
methods employing standard recombinant techniques may be used to correct errors and complete the 

15 missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. ( 1 997) Short 

Protocols in Molecular Biology , John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) 
Molecular Cloning. A Laboratory Manual , Cold Spring Harbor Press, Plainview NY.) 

Assembly of cDNA Sequences 
20 Human polynucleotide sequences may be assembled using programs or algorithms well 

known in the art. Sequences to be assembled are related, wholly or in part and may be derived from 
a single or many different transcripts. Assembly of the sequences can be performed using such 
programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly 
system (GCG), or other methods known in the art. 

25 Alternatively, cDNA sequences are used as "component" sequences that are assembled into 

"template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, 
and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway 
known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). 
A series of BLAST comparisons is performed and low-information segments and repetitive elements 

30 (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to prevent spurious 
matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences 
are then loaded into a relational database management system (RDMS) which assigns edited 
sequences to existing templates, if available. When additional sequences are added into the RDMS, a 
process is initiated which modifies existing templates or creates new templates from works in 

35 progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences 
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themselves. After the new sequences have been assigned to templates, the templates can be merged 
into bins. If multiple templates exist in one bin. the bin can be split and the templates reannotated. 

Once gene bins have been generated based upon sequence alignments, bins are "clone joined" 
based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in 
5 one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two 
bins should be merged into a single bin. Only bins which share at least two different clones are 
merged. 

A resultant template sequence may contain either a partial or a full length open reading 
frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the 

10 full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in 
length. With current technology, cDNAs comprising the coding regions of large genes cannot be 
cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete 
"second strand" synthesis. Template sequences may be extended to include additional contiguous 
sequences derived from the parent RNA transcript using a variety of methods known to those of skill 

15 in the art. Extension may thus be used to achieve the full length coding sequence of a gene. 

Analysis of the cDNA Sequences 

The cDNA sequences are analyzed using a variety of programs and algorithms which are well 
known in the art. (See, e.g., Ausubel, 1997, supra . Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecular 

20 Biology and Biotechnology . Wiley VCH, New York NY, pp. 856-853; and Table 5.) These analyses 
comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular 
organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and 
stop codons: and homology searches. 

Computer programs known to those of skill in the art for performing computer-assisted 

25 searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local 

Alignment Search Tool (BLAST; Altschul, S.F. (1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al. 
(1990) J. Mol. Biol. 215:403-410). BLAST is especially useful in determining exact matches and 
comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally 
maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the 

30 user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropriate search 
tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be 
searched for sequences containing regions of homology to a query mddt or MDDT of the present 
invention. 

Other approaches to the identification, assembly, storage, and display of nucleotide and 
35 polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information," 
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U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecuiar Sequence 
Database," U.S.S.N. 08/81 1,758, filed March 6, 1997; and "Relational Database and System for 
Storing Information Relating to Biomolecuiar Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, 
all of which are incorporated by reference herein in their entirety. 
5 Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, 

BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, 
in "Database System Employing Protein Function Hierarchies for Viewing Biomolecuiar Sequence 
Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference. 

10 Human Disease Detection and Treatment Mole cule Sequences 

The mddt of the present invention may be used for a variety of diagnostic and therapeutic 
purposes. For example, an mddt may be used to diagnose a particular condition, disease, or disorder 
associated with disease detection and treatment molecules. Such conditions, diseases, and disorders 
include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, arteriosclerosis, 
15 atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, 
paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and 
cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, 
teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, 
breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, 
20 pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and 
uterus; and an autoimmune/inflammatory disorder, such as actinic keratosis, acquired 
immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome, 
allergies, ankylosing spondylitis, amyloidosis, anemia, arteriosclerosis, asthma, atherosclerosis, 
autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cirrhosis, 
25 contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, 

emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, 
Goodpasture's syndrome, gout, Graves' disease, Hashimoto's thyroiditis, paroxysmal nocturnal 
hemoglobinuria, hepatitis, hypereosinophilia, irritable bowel syndrome, episodic lymphopenia with 
lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclerosis, myasthenia gravis, 
3 o myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, 
polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, 
Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, 
primary thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, 
complications of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic 
35 cancer including lymphoma, leukemia, and myeloma. The mddt can be used to detect the presence 
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of, or to quantify the amount of, an mddt-related polynucleotide in a sample. This information is then 
compared to information obtained from appropriate reference samples, and a diagnosis is established. 
Alternatively, a polynucleotide complementary to a given mddt can inhibit or inactivate a 
therapeutically relevant gene related to the mddt. 

5 

Analysis of mddt Expression Patterns 

The expression of mddt may be routinely assessed by hybridization-based methods to 
determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity 
of mddt expression. For example, the level of expression of mddt may be compared among different 

10 cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at 
different developmental stages, or among cell types or tissues undergoing various treatments. This 
type of analysis is useful, for example, to assess the relative levels of mddt expression in fully or 
partially differentiated cells or tissues, to determine if changes in mddt expression levels are 
correlated with the development or progression of specific disease states, and to assess the response 

15 of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies. 
Methods for the analysis of mddt expression are based on hybridization and amplification 
technologies and include membrane-based procedures such as northern blot analysis, high- throughput 
procedures that utilize, for example, microarrays, and PCR-based procedures. 

20 Hybridization and Genetic Analysis 

The mddt, their fragments, or complementary sequences, may be used to identify the presence 
of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The 
mddt may be hybridized to naturally occurring or recombinant nucleic acid sequences under 
appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the 

25 nucleic acid sequence of at least one of the mddt allows for the detection of nucleic acid sequences, 
including genomic sequences, which are identical or related to the mddt of the Sequence Listing. 
Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides 
of SEQ ID NO: 1-14 and tested for their ability to identify or amplify the target nucleic acid sequence 
using standard protocols. 

30 Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in 

SEQ ID NO: 1-1 4 and fragments thereof, can be identified using various conditions of stringency. 
(See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) 
Methods Enzymol. 152:507-51 1.) Hybridization conditions are discussed in "Definitions." 

A probe for use in Southern or northern hybridization may be derived from a fragment of an 

35 mddt sequence, or its complement, that is up to several hundred nucleotides in length and is either 
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10 



single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials 
such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to 
artificial substrates containing mddt. Microarrays are particularly suitable for identifying the 
presence of and detecting the level of expression for multiple genes of interest by examining gene 
expression correlated with, e.g., various stages of development, treatment with a drug or compound, 
or disease progression. An array analogous to a dot or slot blot may be used to arrange and link 
polynucleotides to the surface of a substrate using one or more of the following: mechanical 
(vacuum), chemical, thermal, or UV bonding procedures. Such an array may contain any number of 
mddt and may be produced by hand or by using available devices, materials, and machines. 

Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g., 
Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. 
USA 93:10614-10619; Baldeschweiler et al. (1995) PCT application W095/251 1 16; Shalon, D. et al. 
(1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 
2155; and Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662.) 
1 5 Probes may be labeled by either PCR or enzymatic techniques using a variety of 

commercially available reporter molecules. For example, commercial kits are available for 
radioactive and chemiiuminescent labeling (Amersham Pharmacia Biotech) and for alkaline 
phosphatase labeling (Life Technologies). Alternatively, mddt may be cloned into commercially 
available vectors for the production of RNA probes. Such probes may be transcribed in the presence 
20 of at least one labeled nucleotide (e.g., 32 P-ATP, Amersham Pharmacia Biotech). 

Additionally the polynucleotides of SEQ ID NO: 1-1 4 or suitable fragments thereof can be 
used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures 
well known in the art, e.g., cDNA library screening, PCR amplification, etc. The molecular cloning 
of such full length cDNA sequences may employ the method of cDNA library screening with probes 
2 5 using the hybridization, stringency, washing, and probing strategies described above and in Ausubel, 
supra . Chapters 3, 5, and 6. These procedures may also be employed with genomic libraries to isolate 
genomic sequences of mddt in order to analyze, e.g., regulatory elements. 



Genetic Mapping 

3 o Gene identification and mapping are important in the investigation and treatment of almost all 

conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, 
diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex 
than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being 
predictive of predisposition for a particular condition, disease, or disorder. For example, 

3 5 cardiovascular disease may result from malfunctioning receptor molecules that fail to clear 
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cholesterol from the bloodstream, and diabetes may result when a particular individual's immune 
system is activated by an infection and attacks the insulin-producing cells of the pancreas. In some 
studies, Alzheimer's disease has been linked to a gene on chromosome 21 ; other studies predict a 
different gene and location. Mapping of disease genes is a complex and reiterative process and 
5 generally proceeds from genetic linkage analysis to physical mapping. 

As a condition is noted among members of a family, a genetic linkage map traces parts of 
chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of 
particular conditions to particular regions of chromosomes, as defined by RFLP or other markers. 
(See, for example, Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353*7357.) 
io Occasionally, genetic markers and their locations are known from previous studies. More often, 
however, the markers are simply stretches of DNA that differ among individuals. Examples of 
genetic linkage maps can be found in various scientific journals or at the Online Mendelian 
Inheritance in Man (OMIM) World Wide Web site. 

In another embodiment of the invention, mddt sequences may be used to generate 
is hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences. 
Either coding or noncoding sequences of mddt may be used, and in some instances, noncoding 
sequences may be preferable over coding sequences. For example, conservation of an mddt coding 
sequence among members of a multi-gene family may potentially cause undesired cross hybridization 
during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a 
20 specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial 
chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes 
(BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J J. 
et al. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; and Trask, BJ. 
(1991) Trends Genet. 7:149-154.) 
25 Fluorescent in situ hybridization (FISH) may be correlated with other physical chromosome 

mapping techniques and genetic map data. (See, e.g., Meyers, supra , pp. 965-968.) Correlation 
between the location of mddt on a physical chromosomal map and a specific disorder, or a 
predisposition to a specific disorder, may help define the region of DNA associated with that 
disorder. The mddt sequences may also be used to detect polymorphisms that are genetically linked 
30 to the inheritance of a particular condition, disease, or disorder. 

In situ hybridization of chromosomal preparations and genetic mapping techniques, such as 
linkage analysis using established chromosomal markers, may be used for extending existing genetic 
maps. Often the placement of a gene on the chromosome of another mammalian species, such as 
mouse, may reveal associated markers even if the number or arm of the corresponding human 
35 chromosome is not known. These new marker sequences can be mapped to human chromosomes and 
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may provide valuable information to investigators searching for disease genes using positional 
cloning or other gene discovery techniques. Once a disease or syndrome has been crudely correlated 
by genetic linkage with a particular genomic region, e.g., ataxia-telangiectasia to 1 lq22-23, any 
sequences mapping to that area may represent associated or regulatory genes for further investigation. 
5 (See, e.g., Gatti, R.A. et al. ( 1 988) Nature 336:577-580.) The nucleotide sequences of the subject 
invention may also be used to detect differences in chromosomal architecture due to translocation, 
inversion, etc., among normal, carrier, or affected individuals. 

Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned 
in order to identify mutations or other alterations (e.g., translocations or inversions) that may be 
io correlated with disease. This process requires a physical map of the chromosomal region containing 
the disease-gene of interest along with associated markers. A physical map is necessary for 
determining the nucleotide sequence of and order of marker genes on a particular chromosomal 
region. Physical mapping techniques are well known in the art and require the generation of 
overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. 
15 These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is 

determined, the DNA from that region is obtained by consulting the catalog and selecting clones from 
that region. The gene of interest is located through positional cloning techniques using hybridization 
or similar methods. 



20 Diagnostic Uses 

The mddt of the present invention may be used to design probes useful in diagnostic assays. 
Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, 
disorders, or diseases associated with abnormal levels of mddt expression. Labeled probes developed 
from mddt sequences are added to a sample under hybridizing conditions of desired stringency. In 
2 5 some instances, mddt, or fragments or oligonucleotides derived from mddt, may be used as primers in 
amplification steps prior to hybridization. The amount of hybridization complex formed is quantified 
and compared with standards for that cell or tissue. If mddt expression varies significantly from the 
standard, the assay indicates the presence of the condition, disorder, or disease. Qualitative or 
quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based 

3 o technologies or multiple-sample format technologies such as PCR. enzyme-linked immunosorbent 
assay (ELISA)-like, pin, or chip-based assays. 

The probes described above may also be used to monitor the progress of conditions, 
disorders, or diseases associated with abnormal levels of mddt expression, or to evaluate the efficacy 
of a particular therapeutic treatment. The candidate probe may be identified from the mddt that are 

3 5 specific to a given human tissue and have not been observed in GenBank or other genome databases. 
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Such a probe may be used in animal studies, preclinical tests, clinical trials, or in monitoring the 
treatment of an individual patient. In a typical process, standard expression is established by methods 
well known in the an for use as a basis of comparison, samples from patients affected by the disorder 
or disease are combined with the probe to evaluate any deviation from the standard profile, and a 
5 therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy 
is evaluated by determining whether the expression progresses toward or returns to the standard 
normal pattern. Treatment profiles may be generated over a period of several days or several months. 
Statistical methods well known to those skilled in the art may be use to determine the significance of 
such therapeutic agents. 

10 The polynucleotides are also useful for identifying individuals from minute biological 

samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's 
DNA. The polynucleotides of the present invention can also be used to determine the actual 
base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be 
used to prepare PCR primers for amplifying and isolating such selected DNA, which can then be 
15 sequenced. Using this technique, an individual can be identified through a unique set of DNA 

sequences. Once a unique ID database is established for an individual, positive identification of that 
individual can be made from extremely small tissue samples. 

In a particular aspect, oligonucleotide primers derived from the mddt of the invention may be 
used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions and 
20 deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of 
SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) 
and fluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from the 
polynucleotide sequences encoding MDDT are used to amplify DNA using the polymerase chain 
reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy 
25 samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and tertiary 
structures of PCR products in single-stranded form, and these differences are detectable using gel 
electrophoresis in non-denaturing gels. In fSCCP, the oligonucleotide primers are fluorescently 
labeled, which allows detection of the amplimers in high-throughput equipment such as DNA 
sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP 
3 o (isSNP), are capable of identifying polymorphisms by comparing the sequence of individual 

overlapping DNA fragments which assemble into common consensus sequences. These computer- 
based methods filter out sequence variations due to laboratory preparation of DNA and sequencing 
errors using statistical models and automated analyses of DNA sequence chromatograms. In the 
alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the 
35 high throughput MASS ARRAY system (Sequenom, Inc., San Diego CA). 
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DNA-based identification techniques are critical in forensic technology. DNA sequences 
taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, 
saliva, semen, etc., can be amplified using, e.g., PCR, to identify individuals. (See, e.g., Erlich, H. 
(1992) PCR Technoloev . Freeman and Co., New York, NY). Similarly, polynucleotides of the 
present invention can be used as polymorphic markers. 

There is also a need for reagents capable of identifying the source of a particular tissue. 
Appropriate reagents can comprise, for example, DNA probes or primers prepared from the 
sequences of the present invention that are specific for particular tissues. Panels of such reagents can 
identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to 
screen tissue cultures for contamination. 

The polynucleotides of the present invention can also be used as molecular weight markers on 
nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a 
particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel 
polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, 
and as an antigen to elicit an immune response. 

Disease Model Systems Using mddt 

The mddt of the invention or their mammalian homologs may be "knocked out" in an animal 
model system using homologous recombination in embryonic stem (ES) cells. Such techniques are 
well known in the art and are useful for the generation of animal models of human disease. (See, e.g., 
U.S. Patent Number 5,175,383 and U.S. Patent Number 5,767,337.) For example, mouse ES cells, 
such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture. 
The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, 
e.g., the neomycin phosphotransferase gene (neo: Capecchi, M.R. (1989) Science 244:1288-1292). 
The vector integrates into the corresponding region of the host genome by homologous 
recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to 
knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. ( 1 996) 
Clin. Invest. 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). 
Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from 
the C57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and 
the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous 
strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. 

The mddt of the invention may also be manipulated in vitro in ES cells derived from human 
blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell 
lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate 



28 



WO 00/75298 



PCT/US00/15344 



into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. 
(1998) Science 282:1145-1147). 

The mddt of the invention can also be used to create "knockin" humanized animals (pigs) or 
transgenic animals (mice or rats) to model human disease. With knockin technology, a region of 
5 mddt is injected into animal ES cells, and the injected sequence integrates into the animal cell 

genome. Transformed cells are injected into blastulae, and the blastulae are implanted as described 
above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical 
agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to 
overexpress mddt, resulting, e.g., in the secretion of MDDT in its milk, may also serve as a 
10 convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74). 

Screening Assays 

MDDT encoded by polynucleotides of the present invention may be used to screen for 
molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and 

15 the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the 
polypeptide or the bound molecule. Examples of such molecules include antibodies, 
oligonucleotides, proteins (e.g., receptors), or small molecules. 

Preferably, the molecule is closely related to the natural ligand of the polypeptide, e.g., a 
ligand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, Coligan et 

20 al.. (1991) Current Protocols in Immunology 1(2): Chapter 5.) Similarly, the molecule can be closely 
related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, 
e.g., the active site. In either case, the molecule can be rationally designed using known techniques. 
Preferably, the screening for these molecules involves producing appropriate cells which express the 
polypeptide, either as a secreted protein or on the cell membrane. Preferred cells include cells from 

25 mammals, yeast, Drosophila , or E. coli . Cells expressing the polypeptide or cell membrane fractions 
which contain the expressed polypeptide are then contacted with a test compound and binding, 
stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed. 

An assay may simply test binding of a candidate compound to the polypeptide, wherein 
binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. 

30 Alternatively, the assay may assess binding in the presence of a labeled competitor. 

Additionally, the assay can be carried out using cell-free preparations, polypeptide/molecule 
affixed to a solid support, chemical libraries, or natural product mixtures. The assay may also simply 
comprise the steps of mixing a candidate compound with a solution containing a polypeptide, 
measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity 

35 or binding to a standard. 
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Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure 
polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly 
or indirectly, to the polypeptide or by competing with the polypeptide for a substrate. 

All of the above assays can be used in a diagnostic or prognostic context. The molecules 
5 discovered using these assays can be used to treat disease or to bring about a particular result in a 
patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the 
assays can discover agents which may inhibit or enhance the production of the polypeptide from 
suitably manipulated cells or tissues. 

10 Transcript Imaeing 

Another embodiment relates to the use of mddt to develop a transcript image of a tissue or 
cell type. A transcript image is the collective pattern of gene expression by a particular tissue or cell 
type under given conditions and at a given time. This pattern of gene expression is defined by the 
number of expressed genes, their abundance, and their function. Thus the mddt of the present 

15 invention may be used to develop a transcript image of a tissue or cell type by hybridizing, preferably 
in a microarray format, the mddt of the present invention to the totality of transcripts or reverse 
transcripts of a tissue or cell type. The resultant transcript image would provide a profile of gene 
activity pertaining to disease detection and treatment. 

Transcript images which profile mddt expression may be generated using transcripts isolated 

20 from tissues, cell lines, biopsies, or other biological samples. The transcript image may thus reflect 
mddt expression in vivo , as in the case of a tissue or biopsy sample, or in vitro , as in the case of a cell 
line. Transcript images may be used to profile mddt expression in distinct tissue types. This process 
can be used to determine disease detection and treatment molecule activity in a particular tissue type 
relative to this activity in a different tissue type. Transcript images may be used to generate a profile 

25 of mddt expression characteristic of diseased tissue. Transcript images of tissues before and after 
treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor 
the efficacy of drug treatments for diseases which affect the activity of disease detection and 
treatment molecules. 

Transcript images which profile mddt expression may also be used in conjunction with in 
30 vitro model systems and preclinical evaluation of pharmaceuticals. Transcript images of cell lines 
can be used to assess disease detection and treatment molecule activity and/or to identify cell lines 
that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, 
and a transcript image following treatment may indicate the efficacy of these agents in restoring 
desired levels of this activity. A similar approach may be used to assess the toxicity of 
35 pharmaceutical agents as reflected by undesirable changes in disease detection and treatment 
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molecule activity. Candidate pharmaceutical agents may be evaluated by comparing their associated 
transcript images with those of pharmaceutical agents of known effectiveness. 

Antisense Molecules 

The polynucleotides of the present invention are useful in antisense technology. Antisense 
technology or therapy relies on the modulation of expression of a target protein through the specific 
binding of an antisense sequence to a target sequence encoding the target protein or directing its 
expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics . Humana Press Inc., Totawa 
NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3): 171-178; Crooke, ST. (1997) Adv. Pharmacol. 
40:1-49; Sharma, H.W. and R. Narayanan (1995) Bioessays 17(12): 1055-1063; and Lavrosky, Y. et 
al. (1997) Biochem. Mol. Med. 62(1): 1 1-22.) An antisense sequence is a polynucleotide sequence 
capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences 
bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense 
sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J.J. et al. (1991) 
Antisense Res. Dev. l(3):285-288; Lee, R. et al. (1998) Biochemistry 37(3):900-1010; Pardridge, 
W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen, P. E. and Haaima, G. 
(1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of expression 
occurs through hybridization or binding of complementary base pairs. Antisense sequences can also 
bind to DNA duplexes through specific interactions in the major groove of the double helix. 

The polynucleotides of the present invention and fragments thereof can be used as antisense 
sequences to modify the expression of the polypeptide encoded by mddt. The antisense sequences 
can be produced ex vivo , such as by using any of the ABI nucleic acid synthesizer series (PE 
Biosystems) or other automated systems known in the art. Antisense sequences can also be produced 
biologically, such as by transforming an appropriate host cell with an expression vector containing 
the sequence of interest. (See, e.g., Agrawal, supra .) 

In therapeutic use, any gene delivery system suitable for introduction of the antisense 
sequences into appropriate target cells can be used. Antisense sequences can be delivered 
intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence 
complementary to at least a portion of the cellular sequence encoding the target protein. (See, e.g., 
Slater, J.E., et al. (1998) J. Allergy Clin. Immunol. 102(3):469-475; and Scanlon, K.J., et al. (1995) 
9(13): 1288-1296.) Antisense sequences can also be introduced intracellularly through the use of viral 
vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g., Miller, A.D. (1990) Blood 
76:271; Ausubel, F.M. et al. (1995) Current Protocols in Molecular Biology . John Wiley & Sons, 
New York NY; Uckert, W. and W. Walther ( 1 994) Pharmacol. Ther. 63(3):323-347.) Other gene 
delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems 
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known in the art. (See, e.g., Rossi, J.J. (1995) Br. Med. Bull. 51(1 ):2 17-225; Boado, R.J. et al. (1998) 
J. Pharm. Sci. 87(1 1): 1308-1315; and Morris, M.C. et al. (1997) Nucleic Acids Res. 25(14):2730- 
2736.) 

Expression 

In order to express a biologically active MDDT, the nucleotide sequences encoding MDDT or 
fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains 
the necessary elements for transcriptional and translational control of the inserted coding sequence in 
a suitable host. Methods which are well known to those skilled in the art may be used to construct 
expression vectors containing sequences encoding MDDT and appropriate transcriptional and 
translational control elements. These methods include in vitro recombinant DNA techniques, 
synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 
16, and 17; and Ausubel, supra . Chapters 9, 10, 13. and 16.) 

A variety of expression vector/host systems may be utilized to contain and express sequences 
encoding MDDT. These include, but are not limited to, microorganisms such as bacteria transformed 
with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with 
yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); 
plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, 
or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or 
animal (mammalian) cell systems. (See, e.g., Sambrook, supra : Ausubel, 1995, supra . Van Heeke, G. 
and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Bitter, G.A. et al. (1987) Methods Enzymol. 
153:516-544; Scorer, C.A. et al. (1994) Bio/Technology 12:181-184; Engelhard, E.K. et al. (1994) 
Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; 
Takamatsu, N. (1987) EMBO J. 6:307-31 1; Coruzzi, G. et al. (1984) EMBO J. 3:3671-1680; Broglie, 
R. et al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105; 
The McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, New York NY, pp. 
191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Harrington, 
J.J. et al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, 
adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for 
delivery of nucleotide sequences to the targeted organ, tissue, or cell population. (See, e.g., Di 
Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. 
USA 90(13):6340-6344; Buller, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al. 
(1994) Mol. Immunol. 31 (3):21 9-226; and Verma, LM. and N. Somia (1997) Nature 389:239-242.) 
The invention is not limited by the host cell employed. 
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For long term production of recombinant proteins in mammalian systems, stable expression 
of MDDT in cell lines is preferred. For example, sequences encoding MDDT can be transformed into 
cell lines using expression vectors which may contain viral origins of replication and/or endogenous 
expression elements and a selectable marker gene on the same or on a separate vector. Any number 
5 of selection systems may be used to recover transformed cell lines. (See, e.g., Wigler, M. et al. 
(1977) Cell 1 1:223-232; Lowy, I. et al. (1980) Cell 22:817-823.; Wigler, M. et al. (1980) Proc. Natl. 
Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14; Hartman, 
S.C. and R.CMulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051 ; Rhodes, CA. (1995) 
Methods Mol. Biol. 55:121-131.) 

10 

Therapeutic Uses of mddt 

The mddt of the invention may be used for somatic or germline gene therapy. Gene therapy 
may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined 
immunodeficiency (SCID)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et 

15 al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an 
inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; 
Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207- 
216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene 
Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from 

20 Factor VHI or Factor DC deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, I.M. and 
Somia, N. (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g., in the 
case of cancers which result from unregulated cell proliferation), or (iii) express a protein which 
affords protection against intracellular parasites (e.g., against human retroviruses, such as human 
immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) 

25 Proc. Natl. Acad. Sci. USA. 93:1 1395-1 1399), hepatitis B or C vims (HBV, HCV); fungal parasites, 
such as Candida albicans and Paracoccidioides brasiliensis : and protozoan parasites such as 
Plasmodium falciparum and Trypanosoma cruzi) . In the case where a genetic deficiency in mddt 
expression or regulation causes disease, the expression of mddt from an appropriate population of 
transduced cells may alleviate the clinical manifestations caused by the genetic deficiency. 

30 In a further embodiment of the invention, diseases or disorders caused by deficiencies in 

mddt are treated by constructing mammalian expression vectors comprising mddt and introducing 
these vectors by mechanical means into mddt-deficient cells. Mechanical transfer technologies for 
use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) 
ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene 

35 transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. 
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Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. 
Opin. Biotechnol. 9:445-450). 

Expression vectors that may be effective for the expression of mddt include, but are not 
limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), 
5 PCMV-SCRIPT, PCMV-TAG, PEGSH/PERV (Stratagene, La Jolla CA), and PTET-OFF, 

PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). The mddt of the invention 
may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), 
Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), or P-actin genes), (ii) an inducible 
promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. 

10 Acad. Sci. U.S.A. 89:5547-5551; Gossen, M et ah, (1995) Science 268:1766-1769; Rossi, F.M.V. 
and Blau, H.M. (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX 
plasmid (Invitrogen)); the ecdysone-inducible promoter (available in the plasmids PVGRXR and 
PIND; Invitrogen); the FK506/rapamycin inducible promoter; or the RU486/mifepri stone inducible 
promoter (Rossi, F.M.V. and Blau, H.M. supra) ), or (iii) a tissue-specific promoter or the native 

15 promoter of the endogenous gene encoding MDDT from a normal individual. 

Commercially available liposome transformation kits (e.g., the PERFECT LIPID 
TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver 
polynucleotides to target cells in culture and require minimal effort to optimize experimental 
parameters. In the alternative, transformation is performed using the calcium phosphate method 

20 (Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. 
( 1982) EMBO J. 1 :84 1-845). The introduction of DNA to primary cells requires modification of 
these standardized mammalian transfection protocols. 

In another embodiment of the invention, diseases or disorders caused by genetic defects with 
respect to mddt expression are treated by constructing a retrovirus vector consisting of (i) the mddt of 

25 the invention under the control of an independent promoter or the retrovirus long terminal repeat 
(LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) 
along with additional retrovirus c/5-acting RNA sequences and coding sequences required for 
efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are commercially available 
(Stratagene) and are based on published data (Riviere, L et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 

30 92:6733-6737), incorporated by reference herein. The vector is propagated in an appropriate vector 
producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target 
cells or a promiscuous envelope protein such as VSVg (Armentano, D. et ah (1987) J. Virol. 61:1647- 
1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and Miller, A.D. (1988) J. 
Virol. 62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-847 1 ; Zufferey, R. et al. (1998) J. Virol. 
35 72:9873-9880). U.S. Patent Number 5,910,434 to Rigg ("Method for obtaining retrovirus packaging 
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cell lines producing high transducing efficiency retroviral supernatant") discloses a method for 
obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of 
retrovirus vectors, transduction of a population of cells (e.g., CD4 + T-cells), and the return of 
transduced cells to a patient are procedures well known to persons skilled in the art of gene therapy 
and have been well documented (Ranga, U. et al. (1997) J. Virol. 7 1:7020-7029; Bauer, G. et al. 
(1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) 
Proc. Natl. Acad. Sci. U.S.A. 95:1201-1206; Su, L. (1997) Blood 89:2283-2290). 

In the alternative, an adenovirus-based gene therapy delivery system is used to deliver mddt 
to cells which have one or more genetic abnormalities with respect to the expression of mddt. The 
construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in 
the art. Replication defective adenovirus vectors have proven to be versatile for importing genes 
encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) 
Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent 
Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by 
reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:51 1-544 
and Verma, I.M. and Somia, N. (1997) Nature 18:389:239-242, both incorporated by reference herein. 

In another alternative, a herpes-based, gene therapy delivery system is used to deliver mddt to 
target cells which have one or more genetic abnormalities with respect to the expression of mddt. 
The use of herpes simplex virus (HSV)-based vectors may be especially valuable for introducing 
mddt to cells of the central nervous system, for which HSV has a tropism. The construction and 
packaging of herpes-based vectors are well known to those with ordinary skill in the art. A 
replication-competent herpes simplex virus (HSV) type 1 -based vector has been used to deliver a 
reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res. 169:385-395). The 
construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 
5,804,413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated 
by reference. U.S. Patent Number 5,804,413 teaches the use of recombinant HSV d92 which consists 
of a genome containing at least one exogenous gene to be transferred to a cell under the control of the 
appropriate promoter for purposes including human gene therapy. Also taught by this patent are the 
construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV 
vectors, see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 
163:152-161, hereby incorporated by reference. The manipulation of cloned herpesvirus sequences, 
the generation of recombinant virus following the transfection of multiple plasmids containing 
different segments of the large herpesvirus genomes, the growth and propagation of herpesvirus, and 
the infection of cells with herpesvirus are techniques well known to those of ordinary skill in the art. 
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In another alternative, an alphavirus (positive, single-stranded RNA virus) vector is used to 
deliver mddt to target cells. The biology of the prototypic alphavirus, Semliki Forest Virus (SFV), 
has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, 
H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA replication, a 

5 subgenomic RNA is generated that normally encodes the viral capsid proteins. This subgenomic 
RNA replicates to higher levels than the full-length genomic RNA, resulting in the overproduction of 
capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). 
Similarly, inserting mddt into the alphavirus genome in place of the capsid-coding region results in 
the production of a large number of mddt RNAs and the synthesis of high levels of MDDT in vector 

io transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, 
the ability to establish a persistent infection in hamster normal kidney cells (BHK-21) with a variant 
of Sindbis virus (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the 
needs of the gene therapy application (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host 
range of alphaviruses will allow the introduction of MDDT into a variety of cell types. The specific 

15 transduction of a subset of cells in a population may require the sorting of cells prior to transduction. 
The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA 
and RNA transfections, and performing alphavirus infections, are well known to those with ordinary 
skill in the art. 

20 Antibodies 

Anti-MDDT antibodies may be used to analyze protein expression levels. Such antibodies 
include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. 
For descriptions of and protocols of antibody technologies, see, e.g., Pound J.D. ( 1 998) 
Immunochemical Protocols , Humana Press, Totowa, NJ. 

25 The amino acid sequence encoded by the mddt of the Sequence Listing may be analyzed by 

appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions 
of high immunogenicity. The optimal sequences for immunization are selected from the C-terminus, 
the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be 
exposed to the external environment when the polypeptide is in its natural conformation. Analysis 

30 used to select appropriate epitopes is also described by Ausubel (1997, supra , Chapter 1 1 .7). Peptides 
used for antibody induction do not need to have biological activity; however, they must be antigenic. 
Peptides used to induce specific antibodies may have an amino acid sequence consisting of at five 
amino acids, preferably at least 10 amino acids, and most preferably 15 amino acids. A peptide which 
mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as 

35 keyhole limpet cyanin (KLH; Sigma, St. Louis MO) for antibody production. A peptide 
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encompassing an antigenic region may be expressed from an mddt, synthesized as described above, or 
purified from human cells. 

Procedures well known in the art may be used for the production of antibodies. Various hosts 
including mice, goats, and rabbits, may be immunized by injection with a peptide. Depending on the 
5 host species, various adjuvants may be used to increase immunological response. 

In one procedure, peptides about 15 residues in length may be synthesized using an ABI 
431 A peptide synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by 
reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra) . Rabbits are 
immunized with the peptide-KLH complex in complete Freund's adjuvant. The resulting antisera are 
10 tested for antipeptide activity by binding the peptide to plastic, blocking with 1 % bovine serum 
albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- 
rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well 
known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting. 

In another procedure, isolated and purified peptide may be used to immunize mice (about 100 
15 ug of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and 
used to screen the immunized animals' B -lymphocytes for production of antipeptide antibodies. 
Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of 
peptide is sufficient for labeling and screening several thousand clones. Hybridomas of interest are 
detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific 
20 monoclonal antibody. In a typical protocol, wells of a multi-well plate (FAST, Becton-Dickinson, 
Palo Alto, CA) are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species 
IgG) antibodies at 10 mg/ml. The coated wells are blocked with 1% BSA and washed and exposed to 
supernatants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 
mg/ml. 

25 Clones producing antibodies bind a quantity of labeled peptide that is detectable above 

background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are 
injected into pristane-treated mice to produce ascites, and monoclonal antibody is purified from the 
ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several 
procedures for the production of monoclonal antibodies, including in vitro production, are described 

30 in Pound (supra) . Monoclonal antibodies with antipeptide activity are tested for anti-MDDT activity 
using protocols well known in the art, including ELISA, RIA, and immunoblotting. 

Antibody fragments containing specific binding sites for an epitope may also be generated. 
For example, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin 
digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges 

35 of the F(ab')2 fragments. Alternatively, construction of Fab expression libraries in filamentous 
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bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity 
(Pound, supra . Chaps. 45-47). Antibodies generated against polypeptide encoded by mddt can be used 
to purify and characterize full-length MDDT protein and its activity, binding partners, etc. 

Assays Using Antibodies 

Anti-MDDT antibodies may be used in assays to quantify the amount of MDDT found in a 
particular human cell. Such assays include methods utilizing the antibody and a label to detect 
expression level under normal or disease conditions. The peptides and antibodies of the invention 
may be used with or without modification or labeled by joining them, either covalently or 
noncovalently, with a reporter molecule. 

Protocols for detecting and measuring protein expression using either polyclonal or 
monoclonal antibodies are well known in the art. Examples include ELISA, RIA, and fluorescent 
activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes 
between the MDDT and its specific antibody and the measurement of such complexes. These and 
other assays are described in Pound (supra) . 

Without further elaboration, it is believed that one skilled in the art can, using the preceding 
description, utilize the present invention to its fullest extent. The following preferred specific 
embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder 
of the disclosure in any way whatsoever. 

The disclosures of all patents, applications, and publications mentioned above and below, in 
particular U.S. Provisional Application No. 60/137,412, filed June 3, 1999, U.S. Provisional 
Application No. 60/147,542, filed August 5, 1999, U.S. Provisional Application No. 60/147,501, filed 
August 5, 1999, U.S. Provisional Application No. 60/147,500, filed August 5, 1999 are hereby 
expressly incorporated by reference. 

EXAMPLES 

I. Construction of cDNA Libraries 

RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from 
various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while 
others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as 
TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The 
resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was 
precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods. 

Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA 
purity. In most cases, RNA was treated with DNase. For most libraries, poly(A+) RNA was isolated 
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using oligo d(T)-coupled paramagnetic panicles (Promega Corporation (Promega), Madison WI), 
OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA 
purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other 
RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc., Austin TX). 
5 In some cases, Stratagene was provided with RNA and constructed the corresponding cDNA 

libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UNIZAP 
vector system (Stratagene Cloning Systems, Inc. (Stratagene), La Jolla CA) or SUPERSCRIPT 
plasmid system (Life Technologies), using the recommended procedures or similar methods known in 
the art. (See, e.g., Ausubel, 1997, supra, Chapters 5.1 through 6.6.) Reverse transcription was 

10 initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to 
double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or 
enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL S 1000, 
SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia 
Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction 

15 enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), 

pSPORTl plasmid (Life Technologies), or pINCY (Incyte). Recombinant plasmids were transformed 
into competent E. coli cells including XL 1 -Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5a, 
DH10B, or ElectroMAX DH10B from Life Technologies. 

20 IL Isolation of cDNA Clones 

Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector system 
(Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or 
WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge 
BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra 

25 plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). 

Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or 
without lyophilization, at 4°C. 

Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a 
high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216:1-14.) Host cell lysis and thermal 

30 cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 
384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically 
using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a 
FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland). 
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III. Sequencing and Analysis 

cDNA sequencing reactions were processed using standard methods or high -throughput 
instrumentation such as the ABI CATALYST 800 thermal cycler (PE Biosystems) or the PTC-200 
thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific 
5 Corp., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton). cDNA sequencing 
reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI 
sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE 
Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled 
polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular 
10 Dynamics); the ABI PRISM 373 or 377 sequencing system (PE Biosystems) in conjunction with 
standard ABI protocols and base calling software; or other sequence analysis systems known in the 
art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in 
Ausubel, 1997, supra . Chapter 7.7). Some of the cDNA sequences were selected for extension using 
the techniques disclosed in Example VTH 

15 

IV. Assembly and Analysts of Sequences 

Component sequences from chromatograms were subject to PHRED analysis and assigned a 
quality score. The sequences having at least a required quality score were subject to various pre- 
processing editing pathways to eliminate, e.g., low quality 3* ends, vector and linker sequences, poly A 
20 tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and 
sequences smaller than 50 base pairs. In particular, low-information sequences and repetitive 
elements (e.g., dinucleotide repeats, Alu repeats, etc.) were replaced by "nV, or masked, to prevent 
spurious matches. 

Processed sequences were then subject to assembly procedures in which the sequences were 
25 assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene 
bin were assembled to produce consensus sequences (templates). Subsequent new sequences were 
added to existing bins using BLASTn (v. 1.4 WashU) and CROSSMATCH. Candidate pairs were 
identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at 
least 82% local identity were accepted into the bin. The component sequences from each bin were 
30 assembled using a version of PHRAP. Bins with several overlapping component sequences were 

assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was 
determined based on the number and orientation of its component sequences. Template sequences as 
disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading 
frames), to the best determination. The complementary (antisense) strands are inherently disclosed 
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herein. The component sequences which were used to assemble each template consensus sequence 
are listed in Table 4, along with their positions along the template nucleotide sequences. 

Bins were compared against each other and those having local similarity of at least 82% were 
combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 
95% local identity) were re-split. Assembled templates were also subject to analysis by 
STTCCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice 
variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced 
genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of 
the above assembly procedures. 

Once gene bins were generated based upon sequence alignments, bins were clone joined 
based upon clone information. If the 5* sequence of one clone was present in one bin and the 3' 
sequence from the same clone was present in a different bin, it was likely that the two bins actually 
belonged together in a single bin. The resulting combined bins underwent assembly procedures to 
regenerate the consensus sequences. 

The final assembled templates were subsequently annotated using the following procedure. 
Template sequences were analyzed using BLASTn (v2.0, NCBI) versus gbpri (GenBank version 1 16). 
"Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 
100% local identity over 100 base pairs, or a homolog match having an E-value, i.e. a probability 
score, of <. 1 x 10" 8 . The hits were subject to frameshift FASTx versus GENPEPT (GenBank version 
1 16). (See Table 5). In this analysis, a homolog match was defined as having an E-value of 5 1 x 10' 
8 . The assembly method used above was described in "System and Methods for Analyzing 
Biomolecular Sequences," U.S.S.N. 09/276,534, filed March 25, 1999, and the LIFESEQ Gold user 
manual (Incyte) both incorporated by reference herein. 

Following assembly, template sequences were subjected to motif, BLAST, and functional 
analyses, and categorized in protein hierarchies using methods described in, e.g., "Database System 
Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 
08/812,290, filed March 6, 1997; "Relational Database for Storing Biomolecule Information," 
U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence 
Database," U.S.S.N. 08/81 1,758, filed March 6, 1997; and "Relational Database and System for 
Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, 
all of which are incorporated by reference herein. 

The template sequences were further analyzed by translating each template in all three 
forward reading frames and searching each translation against the Pfam database of hidden Markov 
model-based protein families and domains using the HMMER software package (available to the 
public from Washington University School of Medicine, St. Louis MO). Regions of templates which, 
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when translated, contain similarity to Pfarn consensus sequences are reported in Table 2, along with 
descriptions of Pfarn protein domains and families. Only those Pfam hits with an E-value of s 1 x 10* 3 
are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam 
protein domains and families.) 
5 Additionally, the template sequences were translated in all three forward reading frames, and 

each translation was searched against hidden Markov models for signal peptide and transmembrane 
domains using the HMMER software package. Construction of hidden Markov models and their 
usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Curr. Opin. Str. 
Biol. 6:361-365.) Regions of templates which, when translated, contain similarity to signal peptide or 

10 transmembrane domain consensus sequences are reported in Table 3. Only those signal peptide or 
transmembrane hits with a cutoff score of 1 1 bits or greater are reported. A cutoff score of 1 1 bits or 
greater corresponds to at least about 91-94% true-positives in signal peptide prediction, and at least 
about 75% true-positives in transmembrane domain prediction. 

The results of HMMER analysis as reported in Tables 2 and 3 may support the results of 

is BLAST analysis as reported in Table 1 or may suggest alternative or additional properties of 
template-encoded polypeptides not previously uncovered by BLAST or other analyses. 

Template sequences are further analyzed using the bioinformatics tools listed in Table 5, or 
using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi 
Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). 

20 Template sequences may be further queried against public databases such as the GenBank rodent, 
mammalian, vertebrate, prokaryote, and eukaryote databases. 

V. Analysis of Polynucleotide Expression 

Northern analysis is a laboratory technique used to detect the presence of a transcript of a 
25 gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs 
from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra , ch. 7; Ausubel, 
1995, supra, ch. 4 and 16.) 

Analogous computer techniques applying BLAST were used to search for identical or related 
molecules in cDNA databases such as GenBank or LIFESEQ (Incyte Pharmaceuticals). This analysis 
30 is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the 

computer search can be modified to determine whether any particular match is categorized as exact or 
similar. The basis of the search is the product score, which is defined as: 

BLAST Score x Percent Identity 

35 5 x minimum {length(Seq. 1), length(Seq. 2)} 
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The product score takes into account both the degree of similarity between two sequences and the 
length of the sequence match. The product score is a normalized value between 0 and 100, and is 
calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the 
product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is 
calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair 
(HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by 
gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate 
the product score. The product score represents a balance between fractional overlap and quality in a 
BLAST alignment. For example, a product score of 100 is produced only for 100% identity over the 
entire length of the shorter of the two sequences being compared. A product score of 70 is produced 
either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the 
other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% 
identity and 100% overlap. 

VI. Tissue Distribution Profiling 

A tissue distribution profile is determined for each template by compiling the cDNA library 
tissue classifications of its component cDNA sequences. Each component sequence, is derived from 
a cDNA library constructed from a human tissue. Each human tissue is classified into one of the 
following categories: cardiovascular system; connective tissue; digestive system; embryonic 
structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic 
and immune system; liver; musculoskeletal system; nervous system; pancreas; respiratory system; 
sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract. Template sequences, 
component sequences, and cDNA library/tissue information are found in the LIFESEQ GOLD 
database (Incyte Genomics, Palo Alto CA). 

VII. Transcript Image Analysis 

Transcript images are generated as described in Seilhamer et al., "Comparative Gene 
Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by reference. 

VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA 
Oligonucleotide primers designed using an mddt of the Sequence Listing are used to extend 

the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the 
other primer, to initiate 3' extension of the template. The initial primers may be designed using 
OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another 
appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or 
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more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch 
of nucleotides which would result in hairpin structures and primer-primer dimerizations are avoided. 
Selected human cDNA libraries are used to extend the sequence. If more than one extension is 
necessary or desired, additional or nested sets of primers are designed. 
5 High fidelity amplification is obtained by PCR using methods well known in the art. PCR is 

performed in 96-well plates using the PTC-200 thermal cycler (MJ Research). The reaction mix 
contains DNA template, 200 nmol of each primer, reaction buffer containing Mg 2 *, (NH 4 ) 2 S0 4 , and B- 
mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life 
Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair 

10 PCI A and PCI B: Step 1: 94 °C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 
min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C. In the 
alternative, the parameters for primer pair T7 and SK+ are as follows: Step 1 : 94°C, 3 min; Step 2: 
94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; 
Step 6: 68 °C, 5 min; Step 7: storage at 4°C. 

15 The concentration of DNA in each well is determined by dispensing 100 ^il PICOGREEN 

quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in lXTris-EDTA (TE) and 0.5 \x\ of 
undiluted PCR product into each well of an opaque fluorimeter plate (Corning Incorporated 
(Coming), Corning NY), allowing the DNA to bind to the reagent. The plate is scanned in a 
FLUOROSKAN II (Labsy stems Oy) to measure the fluorescence of the sample and to quantify the 

20 concentration of DNA. A 5 ul to 10 p.1 aliquot of the reaction mixture is analyzed by electrophoresis 
on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence. 

The extended nucleotides are desalted and concentrated, transferred to 3 84- well plates, 
digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and 
sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For 

25 shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) 

agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones 
are religated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector 
(Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction 
site overhangs, and transfected into competent E. coli cells. Transformed cells are selected on 

30 antibiotic-containing media, individual colonies are picked and cultured overnight at 37 °C in 384- 
well plates in LB/2x carbenicillin liquid media. 

The cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham 
Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1 : 
94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 

35 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C. DNA is quantified by PICOGREEN 
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reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reamplified 
using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide (1 :2, 
v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC 
DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle 
sequencing ready reaction kit (PE Biosystems). 

In like manner, the mddt is used to obtain regulatory sequences (promoters, introns, and 
enhancers) using the procedure above, oligonucleotides designed for such extension, and an 
appropriate genomic library. 

IX. Labeling of Probes and Southern Hybridization Analyses 

Hybridization probes derived from the mddt of the Sequence Listing are employed for 
screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 
1000 nucleotides in length is specifically described, but essentially the same procedure may be used 
with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using 
a T4 polynucleotide kinase, y^P-ATP, and 0.5X One-Phor-All Plus (Amersham Pharmacia Biotech) 
buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The 
probe mixture is diluted to 10 7 dpm/ug/ml hybridization buffer and used in a typical membrane-based 
hybridization analysis. 

The DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed 
through a 0.7% agarose gel. The DNA fragments are transferred from the agarose to nylon membrane 
(NYTRAN Plus, Schleicher & Schuell, Inc., Keene NH) using procedures specified by the 
manufacturer of the membrane. Prehybridization is carried out for three or more hours at 68 °C, and 
hybridization is carried out overnight at 68 °C. To remove non-specific signals, blots are sequentially 
washed at room temperature under increasingly stringent conditions, up to 0. lx saline sodium citrate 
(SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORIMAGER 
cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of 
standard and experimental lanes are compared. Essentially the same procedure is employed when 
screening RNA. 

X. Chromosome Mapping of mddt 

The cDNA sequences which were used to assemble SEQ ID NO: 1-14 are compared with 
sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other 
implementations of the Smith-Waterman algorithm. Sequences from these databases that match SEQ 
ID NO: 1-14 are assembled into clusters of contiguous and overlapping sequences using assembly 
algorithms such as PHRAP (Table 5). Radiation hybrid and genetic mapping data available from 
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public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for 
Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences 
have been previously mapped. Inclusion of a mapped sequence in a cluster will result in the 
assignment of all sequences of that cluster, including its particular SEQ ID NO:, to that map location. 

5 The genetic map locations of SEQ ID NO: 1-14 are described as ranges, or intervals, of human 

chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus 
of the chromosome's p-arm. (The centiMorgan (cM) is a unit of measurement based on recombination 
frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase 
(Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) 

10 The cM distances are based on genetic markers mapped by Genethon which provide boundaries for 
radiation hybrid markers whose sequences were included in each of the clusters. 

XL Microarray Analysis 

Probe Preparation from Tissue or Cell Samples 

15 Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and 

polyA* RNA is purified using the oligo (dT) cellulose method. Each polyA* RNA sample is reverse 
transcribed using MMLV reverse-transcriptase, 0.05 pg/ul oligo-dT primer (21mer), IX first strand 
buffer, 0.03 units/ul RNase inhibitor, 500 uM dATP, 500 uM dGTP, 500 uM dTTP, 40 uM dCTP, 40 
uM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription 

2 0 reaction is performed in a 25 ml volume containing 200 ng polyA* RNA with GEMBRIGHT kits 
(Incyte). Specific control polyA* RNAs are synthesized by in vitro transcription from non-coding 
yeast genomic DNA (W. Lei, unpublished). As quantitative controls, the control mRNAs at 0.002 ng, 
0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1: 100,000, 
1:10,000, 1:1000, 1 : 100 (w/w) to sample mRN A respectively. The control mRNAs are diluted into 

25 reverse transcription reaction at ratios of 1:3, 3:1, 1:10, 10:1, 1:25,25:1 (w/w) to sample mRN A 
differential expression patterns. After incubation at 37° C for 2 hr, each reaction sample (one with 
Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated 
for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Probes are purified using two 
successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc. 

30 (CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated 
using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is 
then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and 
resuspended in 14 ul 5X SSC/0.2% SDS. 
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Microarrav Preparation 

Sequences of the present invention are used to generate array elements. Each array element 
is amplified from bacterial cells containing vectors with cloned cDNA inserts. PCR amplification 
uses primers complementary to the vector sequences flanking the cDNA insert. Array elements are 
5 amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 
ug. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia 
Biotech). 

Purified array elements are immobilized on polymer-coated glass slides. Glass microscope 
slides (Corning) are cleaned by ultrasound in 0.1% SDS and acetone* with extensive distilled water 
10 washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR 

Scientific Products Corporation (VWR), West Chester, PA), washed extensively in distilled water, 
and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated slides are cured in a 
110°C oven. 

Array elements are applied to the coated glass substrate using a procedure described in US 
15 Patent No. 5,807,522, incorporated herein by reference. 1 ul of the array element DNA, at an average 
concentration of 100 ng/ul, is loaded into the open capillary printing element by a high-speed robotic 
apparatus. The apparatus then deposits about 5 nl of array element sample per slide. 

Microarrays are UV-crosslinked using a STRATALINKER UV-crossl inker (Stratagene). 
Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. 
20 Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate 
buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C followed by washes in 
0.2% SDS and distilled water as before. 

Hybridization 

25 Hybridization reactions contain 9 ul of probe mixture consisting of 0.2 ug each of Cy3 and 

Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The probe mixture 
is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 
cm 2 coverslip. The arrays are transferred to a waterproof chamber having a cavity just slightly larger 
than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 ul 

3 0 of 5x SSC in a corner of the chamber. The chamber containing the arrays is incubated for about 6.5 
hours at 60° C. The arrays are washed for 10 min at 45° C in a first wash buffer (IX SSC, 0.1% SDS), 
three times for 10 minutes each at 45° C in a second wash buffer (0.1X SSC), and dried. 

Detection 

35 Reporter-labeled hybridization complexes are detected with a microscope equipped with an 



47 



WO 00/75298 



PCT/US00/15344 



Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines 
at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is 
focused on the array using a 20X microscope objective (Nikon. Inc., Melville NY). The slide 
containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 
5 scanned past the objective. The 1 .8 cm x 1 .8 cm array used in the present example is scanned with a 
resolution of 20 micrometers. 

In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. 
Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, 
Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate 
10 filters positioned between the array and the photomultiplier tubes are used to filter the signals. The 
emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is 
typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, 
although the apparatus is capable of recording the spectra from both fluorophores simultaneously. 
The sensitivity of the scans is typically calibrated using the signal intensity generated by a 
15 cDNA control species added to the probe mix at a known concentration. A specific location on the 
array contains a complementary DNA sequence, allowing the intensity of the signal at that location to 
be correlated with a weight ratio of hybridizing species of 1 : 1 00,000. When two probes from 
different sources (e.g., representing test and control cells), each labeled with a different fluorophore, 
are hybridized to a single array for the purpose of identifying genes that are differentially expressed, 
2 0 the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and 
adding identical amounts of each to the hybridization mixture. 

The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital 
(A/D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an IBM-compatible PC 
computer. The digitized data are displayed as an image where the signal intensity is mapped using a 
25 linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high 
signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and 
measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping 
emission spectra) between the fluorophores using each fluorophore ? s emission spectrum. 

A grid is superimposed over the fluorescence signal image such that the signal from each spot 
30 is centered in each element of the grid. The fluorescence signal within each element is then 
integrated to obtain a numerical value corresponding to the average intensity of the signal. The 
software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte). 

XII. Complementary Nucleic Acids 

3 5 Sequences complementary to the mddt are used to detect, decrease, or inhibit expression of 
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the naturally occurring nucleotide. The use of oligonucleotides comprising from about 15 to 30 base 
pairs is typical in the art. However, smaller or larger sequence fragments can also be used. 
Appropriate oligonucleotides are designed from the mddt using OLIGO 4.06 software (National 
Biosciences) or other appropriate programs and are synthesized using methods standard in the art or 
5 ordered from a commercial supplier. To inhibit transcription, a complementary oligonucleotide is 
designed from the most unique 5' sequence and used to prevent transcription factor binding to the 
promoter sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent 
ribosomal binding and processing of the transcript. 

10 XIIL Expression of MDDT 

Expression and purification of MDDT is accomplished using bacterial or virus-based 
expression systems. For expression of MDDT in bacteria, cDNA is subcloned into an appropriate 
vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of 
cDNA transcription. Examples of such promoters include, but are not limited to, the trp-lac (tac) 

15 hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator 
regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g., 
BL2KDE3). Antibiotic resistant bacteria express MDDT upon induction with isopropyl beta-D- 
thiogalactopyranoside (IPTG). Expression of MDDT in eukaryotic cells is achieved by infecting 
insect or mammalian cell lines with recombinant Autographica califomica nuclear polyhedrosis virus 

20 (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is 
replaced with cDNA encoding MDDT by either homologous recombination or bacterial-mediated 
transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong 
polyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to 
infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. 

25 Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, 
supra ; and Sandig, supra.) 

In most expression systems, MDDT is synthesized as a fusion protein with, e.g., glutathione 
S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, 
affinity-based purification of recombinant fusion protein from crude cell lysates. GST, a 26- 

30 kilodalton enzyme from Schistosoma iaponicum , enables the purification of fusion proteins on 

immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham 
Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from 
MDDT at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffmity 
purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman 

35 Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables 
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purification on metal-chelate resins (Q1AGEN). Methods for protein expression and purification are 
discussed in Ausubel (1995, supra . Chapters 10 and 16). Purified MDDT obtained by these methods 
can be used directly in the following activity assay. 

5 XIV. Demonstration of MDDT Activity 

MDDT, or biologically active fragments thereof, are labeled with 125 I Bolton-Hunter reagent. 
(See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules 
previously arrayed in the wells of a multi-well plate are incubated with the labeled MDDT, washed, 
and any wells with labeled MDDT complex are assayed. Data obtained using different 
10 concentrations of MDDT are used to calculate values for the number, affinity, and association of 
MDDT with the candidate molecules. 

Alternatively, molecules interacting with MDDT are analyzed using the yeast two-hybrid 
system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially 
available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH). 
15 MDDT may also be used in the PATHCALLLNG process (CuraGen Corp., New Haven CT) 

which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions 
between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. 
Patent No. 6,057,101). 

20 XV. Functional Assays 

MDDT function is assessed by expressing mddt at physiologically elevated levels in 
mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containing 
a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV 
SPORT (Life Technologies) and pCR3.1 (Invitrogen Corporation, Carlsbad CA), both of which 

25 contain the cytomegalovirus promoter. 5-10 ug of recombinant vector are transiently transfected into 
a human cell line, preferably of endothelial or hematopoietic origin, using either liposome 
formulations or electroporation. 1-2 ug of an additional plasmid containing sequences encoding a 
marker protein are co-transfected. 

Expression of a marker protein provides a means to distinguish transfected cells from 

30 nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector. 
Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a 
CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is 
used to identify transfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of 
the cells and other cellular properties. 
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FCM detects and quantifies the uptake of fluorescent molecules that diagnose events 
preceding or coincident with cell death. These events include changes in nuclear DNA content as 
measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured 
by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as 

5 measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and 
intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma 
membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to 
the cell surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow 
Cytometry , Oxford, New York NY. 

10 The influence of MDDT on gene expression can be assessed using highly purified 

populations of cells transfected with sequences encoding MDDT and either CD64 or CD64-GFP. 
CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions 
of human immunoglobulin G (IgG). Transfected cells are efficiently separated from nontransfected 
cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., 

15 Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill 
in the art. Expression of mRNA encoding MDDT and other genes of interest can be analyzed by 
northern analysis or microarray techniques. 

XVI. Production of Antibodies 
2 0 MDDT substantially purified using polyacrylamide gel electrophoresis (PAGE; see, e.g., 

Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to 

immunize rabbits and to produce antibodies using standard protocols. 

Alternatively, the MDDT amino acid sequence is analyzed using LASERGENE software 

(DNASTAR) to determine regions of high immunogenicity, and a corresponding peptide is 
25 synthesized and used to raise antibodies by means known to those of skill in the art. Methods for 

selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well 

described in the art. (See, e.g., Ausubel, 1995, supra . Chapter 11.) 

Typically, peptides 15 residues in length are synthesized using an ABI 431 A peptide 

synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with N- 
30 maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., 

Ausubel, supra .) Rabbits are immunized with the peptide-KLH complex in complete Freund's 

adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to 

plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radio- 

iodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity 
35 using protocols well known in the art, including ELISA, RIA, and immunoblotting. 
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XVII. Purification of Naturally Occurring MDDT Using Specific Antibodies 

Naturally occurring or recombinant MDDT is substantially purified by immunoaffinity 

chromatography using antibodies specific for MDDT. An immunoaffinity column is constructed by 

covalently coupling anti-MDDT antibody to an activated chromatographic resin, such as 
5 CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is 

blocked and washed according to the manufacturer's instructions. 

Media containing MDDT are passed over the immunoaffinity column, and the column is 

washed under conditions that allow the preferential absorbance of MDDT (e.g., high ionic strength 

buffers in the presence of detergent). The column is eluted under conditions that disrupt 
10 antibody/MDDT binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such 

as urea or thiocyanate ion), and MDDT is collected. 

All publications and patents mentioned in the above specification are herein incorporated by 
reference. Various modifications and variations of the described method and system of the invention 

15 will be apparent to those skilled in the art without departing from the scope and spirit of the 
invention. Although the invention has been described in connection with specific preferred 
embodiments, it should be understood that the invention as claimed should not be unduly limited to 
such specific embodiments. Indeed, various modifications of the above-described modes for carrying 
out the invention which are obvious to those skilled in the field of molecular biology or related fields 

20 are intended to be within the scope of the following claims. 
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TABLE 3 



SEQ ID NO: 


Template ID 


Start 


Stop 


Frame 


Domain Type 


1 


222197.6 


317 


406 


forward 2 


SP 


1 


222197.6 


901 


984 


forward 1 


TM 


2 


227709.3 


563 


649 


forward 2 


SP 


5 


243096.6 


3096 


3182 


forward 3 


SP 


6 


244366.6 


2801 


2878 


forward 2 


TM 


7 


405313.4 


2256 


2333 


forward 3 


TM 


7 


405313.4 


1503 


1589 


forward 3 


TM 
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TABLE 4 

SEQ ID NO: Template ID Component 

1 222197.6 3989355H1 

1 222197.6 3989355R6 

1 222197.6 gl 189739 

1 222197.6 gl 123521 

1 222197.6 3417884H2 

1 222197.6 3398916H1 

1 222197.6 696738H1 

1 222197.6 3387328H1 

1 222197.6 3387328F6 

1 222197.6 640954H1 

1 222197.6 640954R1 

1 222197.6 2674395H1 

1 222197.6 4871937H1 

1 222197.6 6014949H1 

1 222197.6 1310167H1 

1 222197.6 1310167F6 

1 222197.6 3422058H1 

1 222197.6 1429773H1 

1 222197.6 1429773F6 

1 222197.6 4725459H1 

1 222197.6 2692245H1 

1 222197.6 2692245F6 

1 222197.6 2658283H1 

1 222197.6 4402233H1 

1 222197.6 673783H1 

1 222197.6 487422H1 

1 222197.6 3928678H1 

1 222197.6 2641613F6 

1 222197.6 2641613H1 

1 222197.6 2770396H1 

1 222197.6 2599469H1 

1 222197.6 486115H1 

1 222197.6 1626615H1 

1 222197.6 1 62661 5F6 

1 222197.6 383522H1 

1 222197.6 3355867H1 

1 222197.6 3617236H1 

1 222197.6 3510978H1 

1 222197.6 1568105H1 

1 222197.6 1571377H1 

1 222197.6 3806389H1 

1 222197.6 g774888 

1 222197.6 2995341H1 

1 222197.6 5547807H1 

1 222197.6 619375H1 

1 222197.6 g 1962367 

1 222197.6 2695323H1 

1 222197.6 3142880H1 

1 222197.6 3805357H1 

1 222197.6 1962884H1 



Start 


Stop 


1 


122 


1 


462 


58 


533 


56 


494 


105 


341 


111 


329 


228 


480 


248 


542 


248 


705 


499 


771 


499 


841 


544 


647 


609 


809 


680 


954 


729 


952 


729 


1153 


733 


986 


748 


1016 


748 


1211 


770 


889 


774 


1025 


774 


1300 


818 


1051 


847 


1083 


847 


1089 


871 


1123 


898 


1175 


1019 


1494 


1019 


1259 


1027 


1273 


1040 


1311 


1181 


1456 


1247 


1456 


1247 


1728 


1260 


1526 


1261 


1532 


1286 


1573 


1300 


1567 


1325 


1446 


1325 


1550 


1368 


1628 


1369 


1729 


1382 


1634 


1412 


1611 


1501 


1738 


1501 


1997 


1517 


1790 


1518 


1792 


1518 


1820 


1538 


1808 
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TABLE 4 



SEQ ID NO: 



Template ID 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 



Component ID 
1807088F6 
1400677H1 
811652H1 
30481 10H1 
30481 10F6 
3048102H1 
5059358H1 
2150138H1 
039720H1 
2434795H1 
21 30701 HI 
2647674H1 
344891 5H1 
437721 5H1 
4317587H1 
1851036H1 
5108761H1 
2893265H1 
1568060H1 
1568004H1 
3531152H1 
2851378H1 
1931202H1 
21 32091 R6 
2132091H1 
5288578H1 
g21 10737 
2207507H1 
2345452H1 
3414767H1 
13061 71 F6 
1306171H1 
4588038H1 
4587760H1 
1389765H1 
gl 137612 
266371 7H1 
3321090H1 
3840347H1 
2415559F6 
2415559H1 
3146224H1 
4201740H1 
3713261H1 
5900620H1 
1239238H1 
1965353R6 
1965353H1 
1471606H1 
3929839H1 



Start 

1555 

1628 

1650 

1665 

1665 

1665 

1668 

1671 

1711 

3030 

3039 

1740 

1810 

1841 

1841 

1841 

1861 

1873 

1878 

1878 

1889 

1922 

1922 

1936 

1936 

1941 

1944 

1996 

1996 

2025 

2042 

2042 

2066 

2066 

2108 

2112 

2146 

2152 

2151 

2175 

2175 

2178 

2178 

2194 

2195 

2205 

2235 

2235 

2273 

2278 



Stop 
2037 
1894 
1954 
1918 
1992 
1965 
1965 
1925 
1970 
3132 
3136 
1841 
2065 
2043 
1916 
2033 
2107 
2139 
2083 
2097 
2207 
2260 
2196 
2208 
2096 
2067 
2226 
2250 
2256 
2265 
2378 
2284 
2349 
2221 
2367 
2425 
2388 
2435 
2339 
2618 
2420 
2430 
2451 
2446 
2484 
2358 
2691 
2500 
2483 
2578 
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TABLE 4 



SEQ ID NO: Template ID 


Component ID 


Start 


Stop 


1 222197.6 


1521966H1 


2278 


2474 


1 222197.6 


1449385H1 


2291 


2533 


1 222197.6 


2381896H1 


2307 


2560 


1 222197.6 


2381895H1 


2307 


2559 


1 222197.6 


4643037H1 


2326 


2554 


1 222197.6 


3703243H1 


2351 


2650 


1 222197.6 


4295130H1 


2350 


2618 


1 222197.6 


42961 84H1 


2350 


2589 


1 222197.6 


5841522H2 


2396 


2675 


1 222197.6 


3790395F6 


2413 


2974 


1 222197.6 


3811615H1 


2413 


2745 


1 222197.6 


2353349H1 


2414 


2513 


1 222197.6 


56981 7H1 


2413 


2660 


1 222197.6 


1621792H1 


2413 


2625 


1 222197.6 


520835H1 


2417 


2637 


1 222197 6 


a2161759 


2433 


2797 


1 222197.6 


1463438H1 


2433 


2622 


1 222197.6 


1621792T6 


2437 


3096 


1 222197.6 


3475011 HI 


2442 


2682 


1 222197.6 


1969764H1 


2447 


2686 


1 222197.6 


4188958H1 


2451 


2774 


1 222197.6 


2134836H1 


2458 


2578 


1 222197.6 


4054390H1 


2467 


2749 


1 222197.6 


5185060H1 


2468 


2694 


1 222197.6 


4058390H1 


2468 


2580 


1 222197.6 


4024007H1 


2469 


2784 


1 222197.6 


5597388H1 


2493 


2771 


1 222197.6 


3934968H1 


2512 


2787 


1 222197.6 


2770396T6 


2518 


3095 


1 222197.6 


993964H1 


2526 


2698 


1 222197.6 


1807088T6 


2531 


3099 


1 222197.6 


2132091T6 


2530 


3101 


1 222197.6 


1965395T6 


2532 


3098 


1 222197.6 


1805709H1 


2532 


2781 


1 222197.6 


4466288H1 


2537 


2803 


1 222197.6 


3020435H1 


2537 


2821 


1 222197.6 


g2355832 


2538 


3035 


1 222197.6 


1672661 HI 


2554 


2667 


1 222197.6 


1881147H1 


2554 


2807 


1 222197.6 


5098316H1 


2573 


2856 


1 222197.6 


1429773T6 


2573 


3089 


1 222197.6 


1 62661 5T6 


2584 


3091 


1 222197.6 


1479854T6 


2587 


3117 


1 222197.6 


3935053H1 


2598 


2897 


1 222197.6 


393091 8H1 


2598 


2915 


1 222197.6 


1654064H1 


2608 


2850 


1 222197.6 


2951 301 HI 


2619 


2908 


1 222197.6 


g4223642 


2627 


3028 


1 222197.6 


2752320H1 


2628 


2928 


1 222197.6 


g2161260 


2634 


3031 
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TABLE 4 



Template ID 


Component ID 


Start 


Stop 


222)97.6 


13061 7 1T6 


2636 


3099 


222197.6 


3387328T6 


2640 


3092 


222197.6 


811652T6 


2639 


3097 


222197.6 


1959841H1 


2652 


2915 


222197.6 


1959841T6 


2652 


3094 


222197.6 


1959841R6 


2652 


3110 


222197.6 


94265714 


2655 


3143 


222197.6 


g4 186863 


2662 


3139 


222197.6 


g3249761 


2663 


3146 


222197.6 


94114970 


2663 


3136 


222197.6 


705080H1 


2666 


2908 


222197.6 


2045743H1 


2673 


2971 


222197.6 


2416559T6 


2675 


3097 


222197.6 


g4394362 


2678 


3067 


222197.6 


92617967 


2680 


3140 


222197.6 


91548565 


2685 


3028 


222197.6 


3790395T6 


2684 


3120 


222197.6 


g4393109 


2684 


3136 


222197.6 


191 5761 HI 


2689 


2948 


222197.6 


g21 53774 


2702 


3136 


222197.6 


996350H1 


2729 


2980 


222197.6 


996350R1 


2729 


3028 


222197.6 


996350T1 


2729 


2992 


222197.6 


997484H1 


2731 


3034 


222197.6 


4801 76T6 


2736 


2990 


222197.6 


4801 76R6 


2736 


3028 


222197.6 


912187H1 


2744 


3042 


222197.6 


1313558H1 


2744 


3006 


222197.6 


g656198 


2763 


3141 


222197.6 


5057011 HI 


2784 


3089 


222197.6 


2260766H1 


2792 


3062 


222197.6 


g3737413 


2803 


3139 


222197.6 


1686080H1 


2810 


3041 


222197.6 


9821623 


2838 


3148 


222197.6 


2641613T6 


2833 


3094 


222197.6 


2328044H1 


2845 


3113 


222197.6 


g4371777 


2859 


3141 


222197.6 


g!516072 


2860 


3144 


222197.6 


g2433045 


2865 


3091 


222197.6 


2648474H1 


2866 


3123 


222197.6 


2434653H1 


2871 


3069 


222197.6 


187807QH1 


^OO 1 


o m/ 


222197.6 


g4109641 


2896 


3139 


222197.6 


242851 4H1 


2909 


3098 


222197.6 


4146636H1 


2909 


3172 


222197.6 


4703838H1 


2929 


3139 


222197.6 


3125420H1 


2934 


3139 


222197.6 


5942277H1 


2971 


3137 


227709.3 


783646H1 


1577 


1867 


227709.3 


231421 1H1 


1585 


1834 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


2 


227709.3 


342368H1 


1595 


1832 


2 


227709.3 


1 833241 R6 


1595 


2004 


2 


227709.3 


•1833241 HI 


1595 


1853 


2 


227709.3 


154117H1 


1609 


1752 


2 


227709.3 


4938186H1 


1617 


1905 


2 


227709.3 


4912871H1 


1628 


1918 


2 


227709.3 


2243937H1 


1653 


1871 


2 


227709.3 


876350H1 


1530 


. 1682 


2 


227709.3 


4959782H1 


1532 


1785 


2 


227709.3 


1839309H1 


1663 


1974 


2 


227709.3 


1378304H1 


1695 


1940 


2 


227709.3 


1660925H1 


1696 


1936 


2 


227709.3 


1894917H1 


1696 


1912 


2 


227709.3 


23949 14H1 


1716 


1815 


2 


227709.3 


3035674H1 


1714 


2034 


2 


227709.3 


808578H1 


1714 


2019 


2 


227709.3 


30351 36H1 


1714 


2035 


2 


227709.3 


808578R1 


1714 


2351 


2 


227709.3 


3852902H1 


1716 


1984 


2 


227709.3 


3122052H1 


1729 


2075 


2 


227709.3 


6107590H1 


1729 


2046 


2 


227709.3 


5925422H1 


1735 


2039 


2 


227709.3 


4378762H1 


1746 


2059 


2 


227709.3 


3781168H1 


1763 


1958 


2 


227709.3 


2469542H1 


1763 


2006 


2 


227709.3 


4585384H1 


1765 


2071 


2 


227709.3 


37721 59H1 


1772 


2087 


2 


227709.3 


1 436901 F6 


1787 


2216 


2 


227709.3 


1436902H1 


1787 


2081 


2 


227709.3 


1436902F1 


1787 


2417 


2 


227709.3 


732765H1 


1796 


2043 


2 


227709.3 


531886H1 


1796 


2064 


2 


227709.3 


732765R1 


1796 


2359 


2 


227709.3 


323492H1 


1796 


2072 


2 


227709.3 


1755142H1 


1811 


2072 


2 


227709.3 


2088594H1 


1817 


2083 


2 


227709.3 


1531927H1 


1820 


2036 


2 


227709.3 


1281 21 OH 1 


1828 


1963 


2 


227709.3 


6181 85H1 


1834 


2133 


2 


227709.3 


072066H1 


1833 


2072 


2 


227709.3 


920524H1 


1837 


2173 


2 


227709.3 


g2030053 


lo4U 




0 


097700 ^ 


a681548 


1845 


2248 


2 


227709.3 


g 1190789 


1847 


2173 


2 


227709.3 


3052574H1 


1853 


2158 


2 


227709.3 


4546684H1 


1856 


1963 


2 


227709.3 


g 1846206 


1855 


2184 


2 


227709.3 


1231442H1 


1856 


2170 


2 


227709.3 


4546692H1 


1856 


1961 


2 


227709.3 


1231220H1 


1856 


2108 






60 







WO 00/75298 



PCT/US00/15344 



TABLE 4 



ceo in MO* 


1 t?l I lfJUJ 1 1? ILx 


Pomnoopot ID 


Start 


StOD 


z 


997700 3 
ZZ/ /Uy.O 


5693680H1 


1861 


2153 


9 

Z 


997700 3 
zz./ /uy.o 


2040451 HI 


1861 


2194 


9 
Z 


997700 3 
ZZ/ /Ur.O 


5781387H1 
j/o i oo / n i 


1861 


2129 


Z 


997700 3 

Z.Z/ fUV.O 


35061 45H1 

uJUU 1 HOI 1 1 


1861 


2181 


9 

Z 


997700 ^ 
ZZ/ /Ut.j 


3479633M1 

JM / TOOOll 1 


1859 


2214 


9 

Z 


997700 3 
ZZ/ /Ur.O 


9 1 OA^AOH I 


1861 


2123 


z 


997700 ^ 
ZZ / /Uy.O 


3872090H1 


1867 


2086 


9 

z 


997700 ^ 
ZZ/ /Ut.O 


i 9 i 0043P l 


1867 


2217 


<-> 
Z 


997700 7 
ZZ//Ut.O 


079A7AW1 
u/ zo/on i 


1867 


2104 


z 


997700 1 

zz//uy.o 


^1 90970W1 
o i zuz/yn i 


1867 


2163 


2 


zz/ /uy.o 


1 91 OO/l^Wl 

i z i uy^on i 


1 A<S7 

1 OL>/ 


9127 


z 


9077OO ^ 

Zz//Uv.o 
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a6 16409 


2341 


2660 


9 
Z 


997700 3 

ZZ/ /UT.O 


90174S8H1 


2340 


2619 


9 
z 


997700 3 
ZZ/ /Ut.O 


1^3437^1 

1 UO*40 / Ul 1 1 


2342 


2572 


9 
Z 


997700 3 


3145020H1 


2344 


2683 


2 


227709.3 


1539734H1 


2358 


2600 


2 


227709.3 


37861 74H1 


2363 


2661 


2 


227709.3 


1454741F1 


2367 


2687 


2 


227709.3 


1454741 HI 


2367 


2632 


2 


227709.3 


2717951H1 


2372 


2548 


2 


227709.3 


1 35981 6H1 


2374 


2618 


2 


227709.3 


1 35981 6F1 


2374 


2694 


2 


227709.3 


g564645 


2385 


2694 
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TABLE 4 



SEQ ID NO* 


Template ID 


Component ID 


Start 


Stop 


2 


227709.3 


gl 154316 


2388 


2698 


2 


227709.3 


g891561 


2393 


2707 


2 


227709.3 


24733 12H1 


2393 


2648 


2 


227709.3 


g2992779 


2394 


2643 


2 


227709.3 


1218580T6 


2395 


2649 


2 


227709.3 


1218580T1 


2395 


2649 


2 


227709.3 


g824343 


2396 


2719 


2 


227709.3 


1218580R6 


2395 


2691 


2 


227709.3 


1218573H1 


2395 


2641 


2 


227709.3 


286961 HI 


2403 


2690 


2 


227709.3 


4367190H1 


2408 


2684 


2 


227709.3 


23991 22H1 


2420 


2672 


2 


227709.3 


5882896H1 


2423 


2687 


2 


227709.3 


g967282 


2422 


2700 


2 


227709.3 


5883904H1 


2423 


2687 


2 


227709.3 


5883908H1 


2423 


2687 


2 


227709.3 


5882375H1 


2424 


2567 


2 


227709.3 


gl 191305 


2429 


2710 


2 


227709.3 


794290H1 


2437 


2664 


2 


227709.3 


520742H1 


2442 


2678 




227709.3 


a6461 74 


2443 


2687 


9 

c. 


227709.3 


1538631 HI 


2443 


2651 


9 

£~ 


227709.3 


1722285H1 


2444 


2677 


9 


227709.3 


a 120271 6 


2469 


2701 


2 


227709.3 


862903T1 


2482 


2648 


2 


227709.3 


095638H1 


2484 


2694 


2 


227709.3 


862903R1 


2484 


2694 


2 


227709.3 


3124846H1 


2490 


2692 


2 


227709.3 


5906470H1 


2497 


2691 


2 


227709.3 


2535162H1 


2525 


2651 


2 


227709.3 


4460277H1 


2533 


2694 


2 


227709.3 


372978T6 


2539 


2649 


2 


227709.3 


372978H1 


2539 


2686 


3 


237703.2 


g 1963754 


1 


374 


3 


237703.2 


gl 137733 


95 


407 


3 


237703.2 


g843567 


124 


414 


3 


237703.2 


3070350H1 


189 


477 


3 


237703.2 


3070350F6 


189 


709 


3 


237703.2 


1439542H1 


413 


686 


3 


237703.2 


3203352H1 


437 


711 


3 


237703.2 


g20 13304 


598 


954 


3 


237703 2 


3799002H1 


701 


1010 


3 


237703.2 


0441 60T6 


958 


1453 


3 


237703.2 


824258H1 


1020 


1254 


3 


237703.2 


g 1894392 


1031 


1472 


3 


237703.2 


3491432H1 


1052 


1315 


3 


237703.2 


2601554F6 


1059 


1602 


3 


237703.2 


2601554H1 


1060 


1336 


3 


237703.2 


4617816H1 


1073 


1337 


3 


237703.2 


4058186H1 


1111 


1197 
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TABLE 4 



ceo in MO- 


Tomnlntp IP) 


Comoooeot ID 


Start 


Stop 


1 
o 


9?7703 2 


5554248H1 


1145 


1355 


1 


91770? 9 

ZO/ / UOx 


5554148H1 


1145 


1386 


o 

o 


9?77fl? 9 


n3594366 


1199 


1611 


7 

0 


9?770? 9 

ZO / / UvJ.i. 


5122547H1 

\J 1 mm, Cm 1 1 • ' 


1278 


1516 


7 
0 


zo/ / uo.z 


9278690H1 


1379 


1654 


7 
0 


9?770? 9 

ZO/ / UOx 


2278690R6 


1379 


1879 


7 

0 


9?770? 9 


5564683H1 


1402 


1658 


7 
0 


97770? 9 
ZO / / uo.z 


95051 9H1 


1436 


1678 


3 


0*5770? 9 
ZO/ / UO.Z 


05051QP6 


1436 


1723 


0 


97770? 9 
ZO / /UO.Z 


nl 19?529 


1437 


1805 


3 


97770? 9 

Zol /Uo.Z 


59<S09?7H1 
uzuuyo/ ri 1 


1514 


1744 


3 


zO/ /UO.z 


?? A/V^QH 1 


1527 


1744 


3 


Zo/ /UO.z 


^97907nm 

DZ / zt / un 1 


1532 


1780 


3 


017701 0 
ZO / /UO.Z 


1AOAA791-1 1 
oouoo/ zn 1 


1533 


1838 


o 
0 




0*W^ 1 OTA 


1554 


2019 


3 


017701 O 

zO/ /UO.z 


710959H1 
0 1 vzozn 1 


1592 

1 w74 


1986 


3 


077701 O 
Zo/ /Uo.Z 


A A 1 1 yl^Wl 

*w i 1 *\o/ n 1 


1 A/S0 

1 UU7 


1947 


0 


017701 9 
ZO/ /UO.Z 


nl0??940 
g 1 vooz*4u 


1669 

1 Uw 7 


2147 


Q 
0 


017701 O 
ZO/ /UO.Z 


ZOU 1 JvW 1 u 


1698 


2313 


0 


077701 9 
ZO/ /UO.Z 


A94957T7S 

OZ*4ZO/ IU 


1699 


2313 


3 


077701 9 
Zo/ /Uo.Z 


OA^O 1 1ATA 
ZOU v 1 00 1 U 


1776 

1 / / u 


2312 


3 


017701 O 
zO/ /UO.z 


1 0 / z^+uy ru 


1788 


2150 


3 


017701 O 
Zo/ /Uo.Z 


i 0/ z^uyn 1 


1788 


2062 


3 


90770? 9 
ZO/ /UO.Z 


1 57941 RH1 
1 0/ zm i on 1 


1798 


1997 


7 


91770? 9 

ZO / / UO.Z 


1 872409T6 


1805 


2312 


0 


91770? 9 

ZO/ / UO.Z 


530381 HI 

00\JO\J 1111 


1810 


1963 


0 


9?770? 9 

ZO/ /UO.Z 


5583955H1 


1811 


2075 


3 


91770? 9 
ZO / / UO.Z 


1 96949H 1 


1816 


2025 


3 


017701 9 
ZO/ /UO.Z 


997AA90T6 

ZZ/ OU7U 1 U 


1829 


2311 


3 


017701 9 
ZO/ /UO.Z 


1911 OAzH-ll 


1832 


2066 


0 


91770? 9 
ZO/ /UO.Z 


9703597H1 
z/ UwOiC' n 1 


1832 


2106 


0 


9?770? 9 
ZO/ /UO.Z 


?95?906H1 


1885 


2159 


3 


917701 9 
zO/ /Uo.Z 


n?0?4991 
yoyo*4zz 1 


1896 

t U7U 


2349 


O 


917701 9 
ZO / / UO.z 


1 A9097?M1 
1 uzuz/ on 1 


1897 

J U7 f 


2117 


o 


91770? 9 

ZO/ / UO.Z 


a 28 19399 


1908 


2351 


1 

0 


91770? 9 

ZO/ / UO.Z 


a3895924 


1936 


2349 


0 


9?770? 9 

ZO / / UO.Z 


n9881 190 


1962 


2270 


0 


9?77D? 9 

ZO/ / UO.Z 


040587H1 


1971 


2158 


1 

o 


9?7703 9 

£.0 1 1 \JO.A. 


a3 147053 


1973 


2349 


7 
0 


9?770? 9 

ZO / / UO.Z 


n843522 


2009 


2349 


3 


01770? 9 
ZO/ /UO.Z 




2023 


2349 


3 


017701 9 
Zo/ /UO.Z 


n9AA1700 
yzoo 1 / yu 


9094 


2349 


3 


237703.2 


g2820075 


2030 


2349 


3 


237703.2 


g2237723 


2051 


2350 


3 


237703.2 


23851 2H1 


2085 


2313 


3 


237703.2 


292954H1 


2182 


2320 


3 


237703.2 


g2013921 


2238 


2522 


3 


237703.2 


g 1980268 


2386 


2742 


4 


240091.1 


2898155H1 


1 


289 


4 


240091.1 


2434264H1 


3 


215 
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TABLE 4 

SEQ ID NO: Template ID Component ID Start Stop 



A 
H 


240091.1 


5075278H1 


3 


127 


A 


240091.1 


2647594H1 


1720 


1770 


A 

4 


940091 1 


4785026H1 


1803 


2073 


A 
*4 


240091.1 


4785001 HI 


1803 


2069 


A 


940091 1 


3600323H1 


1258 


1557 


A 
4 


940091 1 


244821 8F6 


1267 


1485 


4 


940001 1 


244821 8H1 


1267 


1508 


A 

4 


940001 1 


1391 961 T6 


1267 


1730 


A 
4 


940001 1 


083349H1 

/III 


1309 


1464 


A 

4 


940001 1 
Z4UL/7 1 . 1 


070643H1 


1309 


1543 


A 
4 


94OO01 1 

ZMUU7 1 . 1 


3712971T6 


1317 


1744 


A 
4 


940001 1 


3399693H1 


1384 


1606 


A 
4 


94O001 1 


a3000643 


1400 


1562 


A 

4 


940091 1 


a3659260 


1446 


1772 


A 
4 


940001 1 


3491102H1 


1447 


1559 


A 

4 


940001 i 


99B6B46H 1 


1496 


1696 


/I 
4 


940001 1 


45751 30H1 


1513 


1756 


A 

4 


940001 1 


2584803H1 


1539 


1770 


4 


940O01 1 


2584803F6 


1539 


1770 


4 


Z4UUT 1 . 1 


4399541 HI 

HOT / fill 


1590 


1833 


4 


94DOQ1 1 


489490H1 


1599 


1844 


4 


Z4UUT 1 . 1 


S541073H1 


1605 


1804 


A 

4 


Z4UUY J . 1 


797486H1 

lit HUUI 1 1 


1609 


1772 


4 


94O001 1 


9435453H1 


3 


231 


4 


940001 l 

Z4UUY 1 . 1 


9434264R6 


3 


491 


4 


940001 1 

Z4UL/Y I . 1 


5279 14H1 


4 


275 


4 


z4uuy i . i 


43A9936H 1 


10 


241 


4 


o/inooi l 

Z4UUV l . 1 


4910908H1 


20 


292 


4 


z4uuy i . i 


V>1^238H1 


47 


340 


4 


940001 1 


3615238F6 


47 


528 


4 


940001 1 


9733107H1 


54 


275 


4 


94OO01 1 


494380H1 


62 


307 


A 

4 


940O01 1 
Z4UUY 1 . 1 


1391 961 F6 


66 


475 


4 


940091 1 


1391 961 H1 


66 


318 


4 


940001 1 


5801 12H1 


359 


558 


4 


940001 1 


1232706F6 


389 


842 




240091.1 


1232706H1 


389 


629 


A 
H 


940091 1 


3487133H1 


432 


698 


A 
4 


940091 1 


g4244249 


511 


981 


4 


940001 1 


4476 19H1 


580 


799 


4 


940091 1 


57822 14H1 


670 


964 


4 


240091.1 


4913546F6 


830 


1249 


4 


240091.1 


4913546H1 


830 


1108 


4 


240091.1 


38921 11 HI 


834 


1130 


4 


240091.1 


4742244H1 


836 


1102 


4 


240091.1 


g 1484624 


867 


1316 


4 


240091.1 


2376485F6 


901 


1205 


4 


240091.1 


2376485H1 


901 


1124 


4 


240091.1 


2376485T6 


902 


1167 


4 


240091.1 


1849607H1 


907 


1198 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


4 


240091.1 


4030992H1 


952 


1198 


4 


240091.1 


2764886H1 


975 


1201 


4 


240091.1 


4665206H1 


1138 


1401 


4 


240091.1 


1232706T6 


1176 


1729 


4 


240091.1 


2434264T6 


1210 


1746 


4 


240091.1 


4424318H1 


1223 


1459 


5 


243096.6 


Q3228919 


1006 


1482 


5 


243096.6 


g3597815 


1016 


1479 


5 


243096.6 


g3933703 


1018 


1479 


5 


243096.6 


g4372582 


1022 


1487 


5 


243096.6 


4103071H1 


1030 


1104 


5 


243096.6 


3478526H1 


1032 


1183 


5 


243096.6 


g4457447 


1045 


1481 


5 


243096.6 


g4222324 


1075 


1480 


5 


243096.6 


631209R6 


1076 


1478 


5 


243096.6 


g3 118595 


1075 


1478 


5 


243096.6 


g2577165 


1076 


1480 


5 


243096.6 


g2 185952 


1078 


1493 


5 


243096.6 


g2063697 


1079 


1482 


5 


243096.6 


222052H1 


1079 


1215 


5 


243096.6 


222052F1 


1078 


1479 


5 


243096.6 


222052R1 


1078 


1479 


5 


243096.6 


g3737532 


1083 


1509 


5 


243096.6 


1849724T6 


1095 


1440 


5 


243096.6 


g41 10131 


1115 


1481 


5 


243096.6 


631209T6 


1116 


1438 


5 


243096.6 


2446727T6 


1119 


1438 


5 


243096.6 


g31 55321 


1129 


1475 


5 


243096.6 


g3 1781 76 


1148 


1494 


5 


243096.6 


9481 77H1 


1149 


1428 


5 


243096.6 


9481 77R1 


1149 


1488 


5 


243096.6 


1942176H1 


1163 


1441 


5 


243096.6 


1942176R6 


1163 


1418 


5 


243096.6 


1942168H1 


1163 


1440 


5 


243096.6 


5884138H1 


1164 


1433 


5 


243096.6 


1418514T6 


1209 


1430 


5 


243096.6 


1418514H1 


1216 


1431 


5 


243096.6 


1418364H1 


1216 


1468 


5 


243096.6 


1418514F6 


1216 


1479 


5 


243096.6 


632343H1 


1218 


1458 


5 


243096.6 


g4107711 


1220 


1526 


5 


243096.6 


g4457962 


1227 


1483 


5 


243096.6 


g2837785 


1228 


1479 


5 


243096.6 


g819991 


1238 


1496 


5 


243096.6 


g564440 


1237 


1488 


5 


243096.6 


g816379 


1251 


1540 


5 


243096.6 


g885380 


1252 


1488 


5 


243096.6 


g768804 


1261 


1481 


5 


243096.6 


6093263H1 


1263 


1492 


5 


243096.6 


g645318 


1286 


1488 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


bTOp 


5 


243096.6 


g566867 


1292 


1 ZOA 
1 OZO 


5 


243096.6 


g816311 


1302 


io/y 


5 


243096.6 


g671079 


I2YO 




5 


243096.6 


g22 19072 


1297 




5 


243096.6 


g2539665 


1300 


14/V 


5 


243096.6 


g670466 


1300 


lozo 


5 


243096.6 


2474482H1 


1313 


I04z 


5 


243096.6 


g4328047 


1314 


1487 


5 


243096.6 


g2205935 


1335 


1454 


5 


243096.6 


g832021 


1367 


1678 


5 


243096.6 


g2205936 


1375 


14/2 


5 


243096.6 


g2789365 


1393 


1479 


5 


243096.6 


g873007 


1418 


1540 


5 


243096.6 


g900098 


1419 


1488 


5 


243096.6 


g567639 


1424 


1667 


5 


243096.6 


1918362H1 


1473 


1731 


5 


243096.6 


4727611 HI 


1576 


1854 


5 


243096.6 


g822245 


1648 


1976 


5 


243096.6 


g812869 


1651 


2012 


5 


243096.6 


g830918 


1654 


2012 


5 


243096.6 


1414612H1 


1662 


1890 


5 


243096.6 


4761250H1 


1662 


1939 


5 


243096.6 


g678372 


1665 


1972 


5 


243096.6 


g561207 


1665 


1955 


5 


243096.6 


g2002379 


1665 


2002 


5 


243096.6 


g709471 


1665 


1866 


5 


243096.6 


4761242H1 


1664 


1939 


5 


243096.6 


g518391 


1678 


1933 


5 


243096.6 


4595686H1 


1707 


1981 


5 


243096.6 


1511493H1 


1792 


1996 


5 


243096.6 


1511493F6 


1792 


2244 


5 


243096.6 


1512376H1 


1792 


2009 


5 


243096.6 


g2003356 


1922 


2087 


5 


243096.6 


4697285H1 


2203 


2446 


5 


243096.6 


4941432H1 


2381 


2673 


5 


243096.6 


1230891 HI 


2413 


2508 


5 


243096.6 


1522037H1 


2421 


2625 


5 


243096.6 


3749969H1 


2502 


2799 


5 


243096.6 


2125142H1 


2527 


2795 


5 


243096.6 


2125142F6 


2527 


2841 


5 


243096.6 


121562H1 


2620 


2806 


5 


243096.6 


856668H1 


2745 


2933 


5 


243096.6 


5882521 HI 


2758 


3030 


5 


243096.6 


5888582H1 


2759 


2976 


5 


243096.6 


5882569H1 


2760 


3030 


5 


243096.6 


g775350 


2793 


3137 


5 


243096.6 


g705857 


2790 


3138 


5 


243096.6 


g2002380 


2803 


3138 


5 


243096.6 


5927949H1 


2845 


3140 


5 


243096.6 


1511493T6 


2883 


3500 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


C4vsr-\ 

MOp 


5 


243096.6 


1335311 HI 


295) 


oAJO 


5 


243096.6 


1613745H1 


2965 


6 1 / y 


5 


243096.6 


f\ J ^ AMI |1 

3472823H1 


3005 


0V40 


5 


243096.6 


g570224 


3048 


JO 1 o 


5 


243096.6 


g4095588 


3134 


0040 


5 


243096.6 


g831152 


3143 


oooo 


5 


243096.6 


g4286632 


3205 


34/0 


5 


243096.6 


5907720H1 


3208 


3501 


5 


243096.6 


g4 187457 


3286 


3557 


5 


243096.6 


g3842315 


3295 


3465 


5 


243096.6 


g4005713 


3303 


3465 


5 


243096.6 


g4006389 


3305 


3465 


5 


243096.6 


g4006377 


3305 


3559 


5 


243096.6 


g4006150 


3305 


3537 


5 


243096.6 


g4006070 


3315 


3542 


5 


243096.6 


g4 187003 


3315 


3554 


5 


243096.6 


g4 188554 


3315 


3537 


5 


243096.6 


g4006771 


3315 


3537 


5 


243096.6 


g4072007 


3316 


3542 


5 


243096.6 


g4017934 


3316 


3537 


5 


243096.6 


g4 150328 


3316 


3465 


5 


243096.6 


g4005644 


3316 


3537 


5 


243096.6 


5840086H1 


3345 


3553 


5 


243096.6 


5289394H1 


3472 


3737 


5 


243096.6 


g710217 


3508 


3789 


5 


243096.6 


g694295 


3619 


3781 


5 


243096.6 


g2206232 


3623 


3794 


5 


243096.6 


g2206104 


3656 


3795 


5 


243096.6 


289721 5H1 


1 


249 


5 


243096.6 


3541808H1 


181 


397 


5 


243096.6 


2352032H1 


32 


249 


5 


243096.6 


2446727F6 


44 


104 


5 


243096.6 


3123367H1 


44 


356 


5 


243096.6 


4385825H1 


181 


379 


5 


243096.6 


2446727H1 


44 


308 


5 


243096.6 


2905666H1 


45 


326 


5 


243096.6 


276761 6H1 


46 


308 


5 


243096.6 


4521 271 HI 


214 


473 


5 


243096.6 


1725750H1 


47 


209 


5 


243096.6 


g 1965606 


235 


621 


5 


243096.6 


3117919H1 


47 


328 


5 


243096.6 


2762827H1 


49 




5 


243096.6 


5395762H1 


245 


510 


5 


243096.6 


5585677H1 


251 


484 


5 


243096.6 


3416289H1 


253 


507 


5 


243096.6 


5407275H1 


257 


511 


5 


243096.6 


5407149H1 


257 


520 


5 


243096.6 


3452689H1 


49 


240 


5 


243096.6 


4819033H1 


292 


515 


5 


243096.6 


483458H1 


50 


302 
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ID NO: 


Template ID 


Component ID 


Start 


bTOp 


5 


243096.6 


1941753H1 


291 


53/ 


5 


243096.6 


485741 HI 


50 


on/ 

296 


5 


243096.6 


g766717 


307 


480 


5 


243096.6 


221 5901 HI 


51 


147 


5 


243096.6 


2990593H1 


73 


383 


5 


243096.6 


4017755H1 


76 


378 


5 


243096.6 


25584 16H1 


81 


363 


5 


243096.6 


870507R1 


82 


682 


5 


243096.6 


870507H1 


82 


339 


5 


243096.6 


3692401 HI 


82 


285 


5 


243096.6 


2387055H1 


83 


336 


5 


243096.6 


1641267H1 


329 


555 


5 


243096.6 


2483682H1 


84 


331 


5 


243096.6 


4977750H1 


89 


382 


5 


243096.6 


g4244257 


343 


817 


5 


243096.6 


3165660H1 


89 


373 


5 


243096.6 


4043243H1 


89 


406 


5 


243096.6 


3500940H1 


347 


625 


5 


243096.6 


3580985H1 


91 


415 


5 


243096.6 


1518953F6 


93 


405 


5 


243096.6 


g672832 


92 


414 


5 


243096.6 


g574622 


92 


418 


5 


243096.6 


2206923H1 


355 


624 


5 


243096.6 


2681788H1 


92 


287 


5 


243096.6 


g672843 


92 


444 


5 


243096.6 


790680R1 


365 


938 


5 


243096.6 


790680H1 


365 


584 


5 


243096.6 


3500449H1 


386 


701 


5 


243096.6 


1518953H1 


92 


280 


5 


243096.6 


3510335H1 


92 


396 


5 


243096.6 


4401867H1 


393 


653 


5 


243096.6 


1624276H1 


395 


583 


5 


243096.6 


3099469H1 


92 


415 


5 


243096.6 


g873107 


93 


484 


5 


243096.6 


g874944 


93 


492 


5 


243096.6 


2201243H1 


411 


667 


5 


243096.6 


4907323H2 


97 


377 


5 


243096.6 


2215590H1 


421 


665 


5 


243096.6 


1647105H1 


102 


323 


5 


243096.6 


3337242H1 


105 


332 


5 


243096.6 


3328567H1 


421 


709 


5 


243096.6 


5165830H1 


113 


OV 1 


5 


243096.6 


1919378R6 


432 


865 


5 


243096.6 


2078775H1 


114 


391 


5 


243096.6 


1919378H1 


432 


700 


5 


243096.6 


1798353H1 


115 


371 


5 


243096.6 


5109893H1 


447 


675 


5 


243096.6 


3581083H1 


116 


378 


5 


243096.6 


2202470H1 


456 


711 


5 


243096.6 


1642210H1 


461 


676 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


5 


243096.6 


282881 7H1 


122 


399 


5 


243096.6 


1642206H1 


461 


676 


5 


243096.6 


g669309 


136 


449 


5 


243096.6 


493261 6H1 


463 


618 


5 


243096.6 


g571269 


136 


492 


5 


243096.6 


1753890H1 


464 


690 


5 


243096.6 


75621 7H1 


464 


706 


5 


243096.6 


3030389H1 


464 


764 


5 


243096.6 


g677690 


136 


462 


5 


243096.6 


1754027H1 


464 


704 


5 


243096.6 


3893562H1 


139 


449 


5 


243096.6 


g885379 


139 


482 


5 


243096.6 


026083H1 


509 


692 


5 


243096.6 


2758492H1 


143 


413 


5 


243096.6 


836506R1 


513 


1092 


5 


243096.6 


5294950H1 


147 


395 


5 


243096.6 


836506H1 


513 


759 


5 


243096.6 


1520256H1 


517 


685 


5 


243096.6 


173031 HI 


157 


390 


5 


243096.6 


5197820H2 


161 


419 


5 


243096.6 


g2240993 


522 


. 940 


5 


243096.6 


g766746 


161 


395 


5 


243096.6 


3728525H1 


572 


854 


5 


243096.6 


g677072 


160 


502 


5 


243096.6 


28281 15T6 


574 


950 


5 


243096.6 


g28 16446 


612 


874 


5 


243096.6 


4536339H1 


624 


877 


5 


243096.6 


2670757H1 


629 


868 


5 


243096.6 


4994228H1 


697 


1004 


5 


243096.6 


793596H1 


697 


946 


5 


243096.6 


g2058963 


697 


941 


5 


243096.6 


1560602H1 


719 


948 


5 


243096.6 


1535660H1 


719 


898 


5 


243096.6 


g2058866 


730 


936 


5 


243096.6 


2316449H1 


764 


1045 


5 


243096.6 


686656H1 


771 


1028 


5 


243096.6 


3728525T1 


783 


1432 


5 


243096.6 


3659224H2 


789 


1073 


5 


243096.6 


639411 HI 


798 


1050 


5 


243096.6 


5332257H1 


820 


1063 


5 


243096.6 


2084254H1 


837 


1136 


5 


243096.6 


2652025T6 


859 


1424 


5 


243096.6 


961879T6 


859 


1437 


5 


243096.6 


50881 78T6 


858 


1465 


5 


243096.6 


1849724F6 


860 


1441 


5 


243096.6 


1849724H1 ' 


860 


1135 


5 


243096.6 


5395762T1 


884 


1440 


5 


243096.6 


1919378T6 


880 


1452 


5 


243096.6 


2663785H1 


891 


1144 


5 


243096.6 


362501 2H1 


904 


1051 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


5 


243096.6 


2683022H1 


931 


1212 


5 


.243096.6 


3499439H1 


933 


1219 


5 


243096.6 


4005888H1 


935 


121 1 


5 


243096.6 


1682469T7 


939 


1485 


5 


243096.6 


2351530H1 


941 


1 130 


5 


243096.6 


g2063947 


951 


1207 


5 


243096.6 


1668866H1 


960 


1201 


5 


243096.6 


1667633H1 


960 


1220 


5 


243096.6 


g2358923 


963 


1060 


5 


243096.6 


3085780H1 


971 


1082 


5 


243096.6 


g8 14507 


161 


443 


5 


243096.6 


g830479 


161 


470 


5 


243096.6 


g816410 


162 


519 


6 


244366.6 


1889554H1 


493 


750 


6 


244366.6 


1889554F6 


493 


939 


6 


244366.6 


4298324H1 


1 


255 


6 


244366.6 


853003H1 


12 


225 


6 


244366.6 


g21 78494 


517 


912 


6 


244366.6 


853003R6 


26 


483 


6 


244366.6 


3327565H1 


555 


787 


6 


244366.6 


2263295H1 


278 


532 


6 


244366.6 


5401026H1 


604 


816 


6 


244366.6 


1285225H1 


606 


862 


6 


244366.6 


2674162H1 


661 


904 


6 


244366.6 


3101288H1 


295 


585 


6 


244366.6 


32951 39H1 


815 


1056 


6 


244366.6 


6002940H1 


886 


1170 


6 


244366.6 


3101288F6 


295 


694 


6 


244366.6 


6002740H1 


904 


1170 


6 


244366.6 


3246058F6 


941 


1282 


6 


244366.6 


3246058H1 


941 


1192 


6 


244366.6 


3887233H1 


959 


1240 


6 


244366.6 


2431320H1 


972 


1192 


6 


244366.6 


1513444H1 


978 


1189 


6 


244366.6 


2813740H1 


1071 


1363 


6 


244366.6 


2815664H1 


1071 


1274 


6 


244366.6 


2813707H1 


1071 


1359 


6 


244366.6 


3492628H1 


1138 


1414 


6 


244366.6 


2183893H1 


1190 


1450 


6 


244366.6 


5641164H1 


1238 


1485 


6 


244366.6 


580082H1 


1239 


1487 


6 


244366.6 


3155135H1 


1254 


1487 


6 


244366.6 


3075416H1 


1282 


1565 


6 


244366.6 


3559024H1 


1351 


1639 


6 


244366.6 


3451987H1 


1525 


1785 


6 


244366.6 


4378692H1 


1597 


1817 


6 


244366.6 


g21 62961 


1742 


2237 


6 


244366.6 


3890528H1 


1764 


1919 


6 


244366.6 


5017346H1 


3006 


3272 


6 


244366.6 


1690531H1 


2938 


3105 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


6 


244366.6 


5890083H1 


3007 


3133 


6 


244366.6 


4614333H1 


3009 


3145 


6 


244366.6 


278311 7H1 


2979 


3239 


6 


244366.6 


3037603H1 


3012 


3292 


6 


244366.6 


032789H1 


3013 


3234 


6 


244366.6 


5262767H2 


3029 


3291 


6 


244366.6 


2626903H1 


3031 


3283 


6 


244366.6 


1403785H1 


3034 


3328 


6 


244366.6 


3321234H1 


2980 


3254 


6 


244366.6 


003727H1 


3043 


3400 


6 


244366.6 


003176H1 


3043 


3543 


6 


244366.6 


003684H1 


3043 


3424 


6 


244366.6 


003185H1 


3043 


3550 


6 


244366.6 


003182H1 


3043 


3493 


6 


244366.6 


003701 HI 


3043 


3471 


6 


244366.6 


003615H1 


3043 


3429 


6 


244366.6 


0031 88H1 


3043 


3483 


6 


244366.6 


003127H1 


3043 


3453 


6 


244366.6 


003465H1 


3043 


3433 


6 


244366.6 


003422H1 


3043 


3380 


6 


244366.6 


529061 7H1 


3002 


3300 


6 


244366.6 


003521 HI 


3043 


3549 


6 


244366.6 


003294H1 


3043 


3411 


6 


244366.6 


003642H1 


3043 


3400 


6 


244366.6 


003646H1 


3043 


3405 


6 


244366.6 


003660H1 


3043 


3392 


6 


244366.6 


5138512H1 


3079 


3394 


6 


244366.6 


094605H1 


3081 


3258 


6 


244366.6 


1726602H1 


3114 


3335 


6 


244366.6 


1623474T6 


3120 


3724 


6 


244366.6 


2381350H1 


3122 


3380 


6 


244366.6 


768275H1 


3130 


3388 


6 


244366.6 


3495251 HI 


3143 


3350 


6 


244366.6 


3705742H1 


3161 


3533 


6 


244366.6 


4744563H1 


3175 


3471 


6 


244366.6 


3101288T6 


3197 


3715 


6 


244366.6 


5561382H1 


3201 


3503 


6 


244366.6 


074734H1 


3202 


3442 


6 


244366.6 


073329H1 


3202 


3403 


6 


244366.6 


1975441T6 


3222 


3706 


6 


244366.6 


3477744H1 


3244 


3416 


6 


244366.6 


2112529T6 


3250 


3723 


6 


244366.6 


g3933445 


3274 


3756 


6 


244366.6 


4861989H1 


3274 


3567 


6 


244366.6 


1737024F6 


3292 


3734 


6 


244366.6 


g2584374 


3285 


3757 


6 


244366.6 


1735490H1 


3292 


3561 


6 


244366.6 


1737024H1 


3292 


3554 


6 


244366.6 


393035H1 


3303 


3590 


6 


244366.6 


21 58031 F6 


3307 


3758 
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btfe) ID NO, 


lempiate il> 


Component ID 


oTarr 


oTOp 


x 
0 


244300.0 


g44oooUo 


3oUo 


o/oo 


6 


244366.6 


2554055H 1 


3325 


oxoyi 

3624 


6 


244366.6 


g 2934389 


3326 


OTC X 

3756 


6 


244366.6 


A OOI Jl 1 VI 

g4391414 


O O A 1 

3341 


3756 


6 


244366.6 


4460645H 1 


3349 


3601 


6 


244366.6 


g 2208607 


o o cn 

3359 


OTC X 

3756 


6 


O A A O X X X 

244366.6 


„ao TOO /lO 

g23 1 8342 


3366 


0~7CX 

O/OO 


6 


244366.6 


g 1062585 


3370 


0*7 vl o 

3743 


6 


244366.6 


g408177l 


O 0"7O 

3372 


OTX 1 

3761 


6 


244366.6 


g 19252 1 2 


3381 


3757 


6 


244366.6 


4726758H 1 


O OOO 

3390 


oxxo 

3662 


6 


244366.6 


g24 10378 


3392 


3763 


6 


244366.6 


O X "7 A T Ot 1 1 

8674 19H1 


o ino 

3398 


O X "70 

3679 


6 


244366.6 


g6 1 6070 


o >mo 

3402 


3764 


6 


244366.6 


g2163418 


3405 


O *7C X 

3756 


6 


244366.6 


g 2555602 


o >t i o 

3413 


3760 


6 


O >t A O X X X 

244366.6 


^.r / 1 O XC 

g561365 


O VlOO 

3428 


O "7C X 

3756 


6 


244366.6 


1 889554T6 


3433 


3725 


6 


244366.6 


g2336915 


3444 


3756 


6 


244366.6 


g6161 15 


3445 


3756 


6 


244366.6 


g4435130 


3469 


3756 


6 


244366.6 


g4525507 


3502 


3757 


6 


244366.6 


21 58031 HI 


3523 


3756 


6 


244366.6 


g2401624 


3528 


3765 


6 


244366.6 


g4268526 


3559 


3756 


6 


244366.6 


2009370H1 


3666 


3756 


6 


244366.6 


218073H1 


3682 


3756 


6 


244366.6 


2350763H1 


3699 


3760 


6 


244366.6 


1647483H1 


2520 


2769 


6 


244366.6 


2432662H1 


2526 


2757 


6 


244366.6 


901941R1 


2538 


3096 


6 


244366.6 


00 1 n A 1 1 11 

901941H1 


2538 


oooc 

2895 


6 


244366.6 


901981H1 


2538 


2858 


6 


244366.6 


2052263H1 


2545 


2857 


6 


244366.6 


3483573H1 


2553 


2851 


6 


244366.6 


3 565807 H 1 


2558 


2822 


6 


244366.6 


g2278841 


2560 


2917 


6 


O A A O XX X 

244366.6 


o t *7o x on 

g2 178439 


OCX o 

2563 


oni *7 

2917 


6 


244366.6 


g21 53824 


OCX c 

2565 


OOI ~T 

2917 


6 


244366.6 


g 13291 45 


OCX o 

2568 


2874 


6 


244366.6 


2198515H1 


OX J! ~7 

2647 


oooo 

2908 


Z 

6 


O A A O XX X 




264/ 


2/20 


6 


244366.6 


g 1548506 


2680 


3207 


6 


244366.6 


2321185H1 


2694 


2917 


6 


244366.6 


324407 1T6 


2568 


2798 


6 


244366.6 


2936492H1 


2700 


2917 


6 


244366.6 


600642H1 


2586 


2891 


6 


244366.6 


g 11 24072 


2711 


2850 


6 


244366.6 


g 1833465 


2712 


2856 


6 


244366.6 


g4327019 


2736 


2851 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


6 


244366.6 


3165778H1 


2588 


2925 


6 


244366.6 


g2139164 


2oV4 




6 


244366.6 


633565H1 


2753 


2917 


6 


244366,6 


1 520269F6 


2766 


3147 


6 


244366.6 


1520026H1 


2766 


2917 


6 


244366.6 


1520269H1 


2766 


2917 


6 


244366.6 


g4095144 


261 1 


2917 


6 


244366.6 


1237708H1 


2767 


3016 


6 


244366.6 


2768411 HI 


2770 


3013 


6 


244366.6 


g!321416 


2615 


2858 


6 


244366.6 


2416809H1 


2800 


2917 


6 


244366.6 


1570846H1 


281 1 


3018 


6 


244366.6 


3898114H1 


2621 


2859 


6 


244366.6 


874422H1 


2824 


3131 


6 


244366.6 


g 1349372 


2825 


2952 


6 


244366.6 


2815235H1 


2842 


3116 


6 


244366.6 


4897079H1 


2925 


3201 


6 


244366.6 


1231478H1 


2632 


2861 


6 


244366.6 


2152887H1 


2927 


3042 


6 


244366.6 


3510024H1 


2635 


2917 


6 


244366.6 


1975441 F6 


2938 


3306 


6 


244366.6 


2189605H1 


2938 


3203 


6 


244366.6 


1975441H1 


2938 


30B8 


6 


244366.6 


3470739H1 


2938 


3153 


6 


244366.6 


gl844965 


2643 


2917 


6 


244366.6 


1623474H1 


2155 


2382 


6 


244366.6 


1338343H1 


1799 


2055 


6 


244366.6 


2022520H1 


2157 


2423 


6 


244366.6 


805131H1 


2200 


2397 


6 


244366.6 


795024H1 


2202 


2393 


6 


244366.6 


1338343F6 


1799 


2241 


6 


244366.6 


1297158H1 


1810 


2050 


6 


244366.6 


3354386H1 


2223 


2491 


6 


244366.6 


2540345H1 


2224 


2461 


6 


244366.6 


2313137H1 


1887 


2152 


6 


244366.6 


2805024H1 


2233 


2536 


6 


244366.6 


245458 1T6 


2281 


2827 


6 


244366.6 


g570404 


1961 


2245 


6 


244366.6 


2773281 HI 


2282 


2528 


6 


244366.6 


3246058T6 


2284 


2807 


6 


244366.6 


3332425T6 


2286 


2817 


6 


244366.6 


3321733H1 


1988 


2109 


6 


244366.6 


g21 53937 


2324 


2754 


6 


244366.6 


1464642H1 


1995 


2224 


6 


244366.6 


g!319564 


2331 


2938 


6 


244366.6 


3555988H1 


2005 


2304 


6 


244366.6 


3384030H1 


2043 


2317 


6 


244366.6 


g 1898453 


2332 


2760 


6 


244366.6 


g 1062443 


2337 


2746 


6 


244366.6 


3188982H1 


2348 


2686 
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SEQ ID NO: 


T - .-i - .- 1 i-«4- -i. IP\ 

Template ID 


component iu 


Q+nrt 

oTan 




6 


244366.6 


Z200/0yn 1 


ZUAO 


ZO 1 O 


6 


244366.6 


Oil OCOOU 1 

Z 1 1 ZOzyn 1 


ZOOO 


ZOZ i 


6 


244366.6 


3693 7o 1 rl 1 


Z0O4 


9AA9 
ZOOZ 


6 


244366.6 


1 864803H 1 


OTkAO 

zuov 


ZOO 1 


6 


244366.6 


853DQ3T6 


ZOOO 


9ft 1 
ZO 1 0 


6 


244366.6 


g 192521 1 


zlo/ 


ZO 10 


6 


244366.6 


2641982H 1 


z3oo 


zOVo 


6 


244366.6 


2516658H1 


2397 


zooo 


6 


244366.6 


1 864803T6 


2414 


Zoyo 


6 


244366.6 


1 338343T6 


2421 


ZO 14 


6 


244366.6 


1398258H1 


244U 


ZOo 1 


6 


244366.6 


1829874H1 


2471 


Z/OO 


6 


244366.6 


589137H1 


z4/o 


ZOOO 


6 


244366.6 


4855638H1 


2147 


z4 1 ) 


6 


244366.6 


1698592H 1 


2oOu 


070A 
Z/ZO 


6 


244366.6 


3524562H1 


2154 


zo4o 


6 


244366.6 


g 1319504 


ZOZO 


zyoz 


6 


244366.6 


1 6234 /4hO 


Z I DO 


9AA9 
ZOOZ 


7 


405313.4 


A L AC\ A L OI 1 1 

4640462H1 


0/0 


OO/ 


7 


405313.4 


g 1774849 


595 


y/y 


7 


405313.4 


4721077H1 


04 


i oyi 
i y4 


7 


405313.4 


5944975H1 


y 1 

61 


370 


7 


405313.4 


1948647H1 


596 


828 


7 


405313.4 


■ 1 59201 6H1 


QA 
OO 


ZoZ 


7 


405313.4 


1948647R6 


596 


1 loo 


7 


405313.4 


g4070751 


686 


1 137 


7 


405313.4 


2384959H 1 


OO 


ZOO 


7 


405313.4 


4571373H1 


715 


mo 
978 


7 


405313.4 


g954058 


893 


1203 


7 


405313.4 


1559555H1 


903 


I lzU 


7 


405313.4 


1 559555F6 


903 


Io6o 


7 


405313.4 


g6 17633 


AT /t 

914 


IOTA 

1 0 lo 


7 


405313.4 


1 30297 /HI 


oO 


ZO/ 


7 


405313.4 


jim C070U l 

4215272H1 


y I v 


i jyo 


7 


405313.4 


g 1492868 


88 


zoU 


7 


405313.4 


46438 1 On 1 


OAO 

Voz 


1919 
1 Z IZ 


7 


405313.4 


965308H 1 


yoo 


IZOO 


7 


405313.4 




OAA 

yoo 


1 A99 
1 ozz 


7 


405313.4 


1321926T6 


loz 


400 


7 


A f\C ^ 1 1 A 

405313.4 


5136028H 1 


Wo 


1 9AA 
IZOO 


7 


405313.4 


g4264253 


155 


Ann 
0U9 




405313.4 


g3739298 


156 


a i n 
OIL) 




405313.4 


4306178H1 


1008 


1207 




405313.4 


g 1237752 


1009 


1175 




405313.4 


4551446H1 


1047 


1310 




405313.4 


g4522654 


210 


522 




405313.4 


1628853H1 


1049 


1219 




405313.4 


1627193H1 


1049 


1261 




405313.4 


1316291H1 


349 


522 




405313.4 


1628853F6 


1049 


1649 
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SEQ ID NO: Template ID 


Component ID 


Start 


Stop 


7 405313.4 


1283759H1 


1053 


i o on 

1330 


7 405313.4 


4312209H1 


1067 


1367 


7 405313.4 


4103086H1 


1115 


1239 


7 405313.4 


g 1984348 


l 1 >l"7 

1 147 


14/2 


7 405313.4 


g2540589 


1 168 


1616 


7 405313.4 


g 1492809 


369 


527 


7 405313.4 


3584223H1 


1 177 


1354 


7 405313.4 


102569H1 


458 


607 


7 405313.4 


4829437H1 


540 


~7 O T 

737 


7 405313.4 


3674811 HI 


1 186 


1495 


7 405313.4 


1733325H1 


1 192 


1412 


7 405313.4 


g 1984560 


561 


804 


7 405313.4 


g2010449 


2002 


2335 


7 405313.4 


2682711 HI 


2014 


2309 


7 405313.4 


g4535531 


2016 


2337 


7 405313.4 


g 1773873 


2022 


2340 


7 405313.4 


g41 37010 


2025 


2346 


7 405313.4 


4111488H1 


2026 


2288 


7 405313.4 


2153069H1 


2034 


2315 


7 405313.4 


870657R1 


2041 


2613 


7 405313.4 


870657H1 


2041 


2250 


7 405313.4 


5433876H1 


2063 


2317 


7 405313.4 


g3 162264 


2073 


2338 


7 405313.4 


g3057393 


2080 


2337 


7 405313.4 


g3872586 


2081 


2335 


7 405313.4 


2081907T6 


2084 


2288 


7 405313.4 


659926H1 


2097 


2337 


7 405313.4 


056422H1 


2097 


2317 


7 405313.4 


3486469H1 


2105 


2337 


7 405313.4 


817086R1 


2127 


2337 


7 405313.4 


817086H1 


2127 


2388 


7 405313.4 


817086T1 


2127 


2280 


7 405313.4 


g3597649 


2142 


2335 


7 405313.4 


3988965H1 


2205 


2499 


7 405313.4 


2664989H1 


1633 


1850 


7 405313.4 


3962192H1 


1645 


1776 


7 405313.4 


2290557H1 


1645 


1921 


7 405313.4 


4115475H1 


1647 


1861 


7 405313.4 


g670108 


1649 


1960 


7 405313.4 


g570685 


1650 


1964 


7 405313.4 


2213032H1 


1653 


1919 


7 405313.4 


1667188H1 


1653 


1775 


7 405313.4 


2285586H1 


1653 


1841 


7 405313.4 


990792H1 


1669 


1968 


7 405313.4 


1283707T6 


1682 


2306 


7 405313.4 


3781966H1 


1686 


2020 


7 405313.4 


2402093H1 


1700 


1946 


7 405313.4 


3666248H1 


1707 


1866 


7 405313.4 


1948647T6 


1711 


2289 


7 405313.4 


5681549H1 


1746 


2012 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


stop 


7 


405313.4 


2885455T6 


1752 


2292 


7 


405313.4 


1301039H1 


1790 


2095 


7 


405313.4 


1669180T6 


1 "700 

1799 


2303 


7 


405313.4 


1669180H1 


i ono 
1 oUo 




7 


405313.4 


3436329H1 


1815 


1956 


7 


405313.4 


g 1624740 


1828 


2218 


7 


405313.4 


1 559555T6 


loo 1 


zoUz 


7 


405313.4 


3106278H1 


1846 


2121 


7 


405313.4 


2323323T6 


1848 


2308 


7 


405313.4 


^ Al A A Jilt 11 

1914641 HI 


1 O A Q 

1 o4o 


2Uoz 


7 


405313.4 


1628853T6 


1854 


2300 


7 


405313.4 


g2946487 


1855 


2335 


7 


405313.4 


2081907F6 


1854 


2313 


7 


405313.4 


2081917H1 


1854 


2015 


7 


405313.4 


g672957 


1859 


2199 


7 


405313.4 


2916179H1 


1878 


2177 


7 


405313.4 


3556793T6 


1881 


2306 


7 


405313.4 


g2783325 


1885 


2345 


7 


405313.4 


1268187F1 


1888 


2344 


7 


405313.4 


1268187H1 


1888 


2152 


7 


405313.4 


1268187F6 


1888 


2203 


7 


405313.4 


1268187T6 


1890 


2317 


7 


405313.4 


g616527 


1892 


2244 


7 


405313.4 


g3593850 


1896 


2335 


7 


405313.4 


g573001 


1897 


2264 


7 


405313.4 


g8 15353 


1898 


2253 


7 


405313.4 


g4083770 


1902 


2335 


7 


405313.4 


g4281927 


1912 


2337 


7 


405313.4 


g2753877 


1930 


2345 


7 


405313.4 


219921 1H1 


1930 


2188 


7 


405313.4 


2375908H1 


1934 


2181 


7 


405313.4 


g3797979 


1954 


2337 


7 


405313.4 


265331 2H1 


1963 


2221 


7 


405313.4 


a r\ / p inn 

g4265408 


i nil 

1967 


2342 


7 


405313.4 


g39 19084 


1969 


2335 


7 


405313.4 


g4085496 


1970 


2334 


7 


405313.4 


g668546 


1976 


Z 1 DO 


7 


405313.4 


2149388H1 


1985 


22/4 


7 


405313.4 


2601556H1 


i ooc 

1995 


2280 


7 


405313.4 


g2789326 


2898 


3179 


7 


405313.4 


g646138 


2913 


3179 


7 




a 888680 


2916 


3206 


7 


405313.4 


g646137 


2924 


3179 


7 


405313.4 


g645108 


2927 


3179 


7 


405313.4 


g917579 


2933 


3178 


7 


405313.4 


g3051580 


2943 


3179 


7 


405313.4 


g3764150 


2943 


3179 


7 


405313.4 


g2903435 


2947 


3179 


7 


405313.4 


g 1225232 


2947 


3179 


7 


405313.4 


g 1087854 


2947 


3111 
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ID NO: 


Template ID 


Component ID 


Start 


Stop 


7 


405313.4 


gl647814 


2998 


3213 


7 


405313.4 


g917686 


3058 


3210 


7 


405313.4 


1226453T1 


3087 


3168 


7 


405313.4 


1226453H1 


3087 


3167 


7 


405313.4 


g31 74481 


3089 


3178 


7 


405313.4 


g646728 


2878 


3179 


7 


405313.4 


g884856 


2884 


3211 


7 


405313.4 


g815354 


2895 


3216 


7 


405313.4 


2714769H1 


2231 


2345 


7 


405313.4 


1440914H1 


2252 


2421 


7 


405313.4 


1440914F6 


2252 


2675 


7 


405313.4 


43U 940H1 


2263 


2546 


7 


405313.4 


6073069H1 


2266 


2498 


7 


405313.4 


549605H1 


2285 


2554 


7 


405313.4 


2157285H1 


2287 


2523 


7 


405313.4 


23231 79H1 


2324 


2460 


7 


405313.4 


2323108H1 


2324 


2581 


7 


405313.4 


g21 97270 


2337 


2696 


7 


405313.4 


2294826H1 


2444 


2517 


7 


405313.4 


g 1043992 


2444 


2683 


7 


405313.4 


386011 OH 1 


2444 


2682 


7 


405313.4 


3246952H1 


2472 


2725 


7 


405313.4 


1956178H1 


2480 


2773 


7 


405313.4 


4312B86H1 


2495 


2794 


7 


405313.4 


g770052 


2623 


2930 


7 


405313.4 


1415784H1 


2660 


2922 


7 


405313.4 


g884855 


2666 


3025 


7 


405313.4 


2014171H1 


2668 


2943 


7 


405313.4 


g888679 


2667 


3021 


7 


405313.4 


2224791 HI 


2686 


2955 


7 


405313.4 


4106915H1 


2695 


2996 


7 


405313.4 


g2217789 


2772 


3181 


7 


405313.4 


g2874275 


2772 


3179 


7 


405313.4 


1440914R1 


2773 


3179 


7 


405313.4 


g4075424 


2775 


3179 


7 


405313.4 


g4328896 


2776 


3179 


7 


405313.4 


287748H1 


2778 


3143 


7 


405313.4 


3496950H1 


2802 


3087 


7 


405313.4 


2750652H1 


2806 


3105 


7 


405313.4 


g765774 


2820 


3182 


7 


405313.4 


g3895056 


2831 


3179 


7 


405313.4 


g4450984 


2833 


3179 


7 


405313.4 


g2099917 


2853 


3348 


7 


405313.4 


g2018248 


2860 


2990 


7 


405313.4 


g564562 


2869 


3179 


7 


405313.4 


g2459191 


2869 


3206 


7 


405313.4 


g 1099005 


2521 


2798 


7 


405313.4 


2731146H1 


2559 


2794 


7 


405313.4 


gl 198836 


2578 


2840 


7 


405313.4 


2229784H1 


2603 


2852 
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CCO 1P\ MO* 


lempicrre il> 




Start 


Stop 


/ 


/lfV\71 7 /I 
4UOO 1 0.4 


999QS4AH 1 


2603 


2858 


7 


4UO0 10.4 


1 701 ooaui 
1 oz 1 yzon 1 


i 




7 


4UOO I0.4 


1 001 OOAPA 

1 oz 1 yzoro 


i 
i 


\J\JsJ 


-7 

7 


4UOO 1 0.4 


i 777 1 rvii-n 
1 00/ I UAn 1 


a 
o 


96S 

tUv 


7 


4000 10.4 


oaaoi i 7uii 
zooz 1 1 on 1 


A 
0 


9A1 


7 


405313.4 


^o7£/jorr 
gz/o4Zoo 


1 1 y i 


1 Ao4 


7 


405313.4 


404 1 400n 1 


I zuo 


1*4/ VJ 


7 


405313.4 


^OCRDl *^0 

gZooo 1 oz 


i zuy 


1 A77 
1 0/ o 


7 


405313.4 


/-»oR/inA7ft 

gzo4Uooo 


101 A 
I Z 10 


1A1 A 


7 


405313.4 


1 OQOnORLJ 1 

1 zozuyon 1 


I zzo 


l/A/1 


7 


405313.4 


IZOO/U/rl 1 


1 ZZO 


1 ^1 1 

1 0 I 1 


7 


405313.4 


1 ooon/iflui 
1 ZoZU4on I 


1 ZZO 


ISflA 
1 ouu 


/ 


/1HC7 17/1 

4UO0 1 o.4 


1 0A77PI7PA 
1 ZOO/U/rO 




16*SQ 


7 


405313.4 


oZouzyOn 1 


1 ZO^J 


1 ol 1 
1 O I 1 


7 


405313.4 


oovo/oon I 


10/11 
1 Z4 1 


1 701 
1 OY 1 


7 


405313.4 


oU/Z 1 ion 1 


1 007 

i zyo 


1 o^A 


7 


405313.4 


CO 1 OA/ t\U 1 

OZ I yo40n 1 


1 710 

i o i y 


1/1/17 

1 *4**0 


7 


Ana 10 ii 

405313.4 


1 oVoUU 1 H 1 


1717 
10 1/ 


1 Al A 
1 0 1 0 


7 


405313.4 


0 AftAfiOAU 1 

OoyooZoH 1 


1 71 0 

i o i y 


1 A1.4 
10 1*4 


7 


•ftp 4 
405313.4 


5920 lyzn 1 


1 71 O 

i o iy 


1 A/17 
104/ 


7 


405313.4 


g3770003 


I oZz 


louy 


7 


405313.4 


5665259H 1 


loZ/ 


1 OOO 


7 


405313.4 


31ol49oHl 


1 70A 

lOZO 


1 A0R 
1 OZO 


7 


405313.4 


33o/2ooHZ 


1 7/1^ 
1 04O 


1 o/Yl 
1 OUU 


7 


405313.4 


oyzooorl 1 


1 7<n7 

1000 


1 AH /I 
1 OU4 


7 


405313.4 


o4UU0U0n 1 


1 7AA 
1 OOO 


1 A97 
1 OZO 


7 


405313.4 


z32oozorl 1 


1 70^ 

i oyo 


1 Ao7 
1 00/ 


7 


405313.4 


2323323R6 


I oyo 


1 QOR 

i oyo 


7 


405313.4 


46ZZ0 1 Zn 1 


1 Af\A 


171 A 
i / 1 0 


7 


405313.4 


5373496H1 


I40O 


1 7 on 
I /uu 


7 


405313.4 


ao / 1 onu i 
6Z4 1 ZUri 1 


1 A RA 
1400 


1 70A 

1 /zo 


7 


405313.4 


Z0004oUn 1 


1 /IAO 

i ^oy 


1 7oTI 


7 


405313.4 


oil ^/tn^u l 
Z 1 1 04UOH 1 


I ouo 


1 AD1 
1 OU 1 


7 


405313.4 


gyo4uoy 


1 ^1 A 
1 0 io 


MAR 


/ 


4UO0 1x3.4 


00O0RA1 Wl 

zzyzoo i n i 


1 o7A 
l OOO 




/ 


4UO0 10.4 


0RRA0A7M1 

zooozo/n i 


i sAn 


lft49 


/ 


/1HR7 17/1 
4U00 10.4 


nAAA07 c i 

yoooy/ o 


i Ain 

IU IU 


1 f\JsJ 


7 


/inci 1 1 /1 
4UO0 1 0.4 


flAlOl 1 Dl 

oo i y 1 i \< i 


1 A10 


9onn 


7 


4UOO 10.4 


OAlOl 1 kll 
OO 1 y I Inl 


1 A10 


187<S 
1 o/o 


7 


4UOO 1 0.4 


go/ ozoo 


1 AO A. 
1 OZO 




8 


436857.2 


Z0ZOO4r 1 


1 ^A7 
140/ 


1 tOt 


0 

0 


/17AQt;7 O 
400O0/.Z 


07n/JAAnTA 




1955 


8 


436857.2 


g4373224 


1522 


1968 


8 


436857.2 


g4 11 2872 


1522 


1954 


8 


436857.2 


g4390509 


1556 


1962 


8 


436857.2 


5858881 HI 


1624 


1888 


8 


436857.2 


5267222H1 


1653 


1883 


8 


436857.2 


1477850T6 


1680 


2225 


8 


436857.2 


4617960T6 


1761 


2215 


8 


436857.2 


g917596 


1901 


2223 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


8 


436857.2 


32701 04H1 


1906 


2161 


8 


436857.2 


5782044H1 


2003 


2209 


8 


436857.2 


g4268407 


2010 


2240 


8 


436857.2 


g3051680 


2063 


2241 


8 


436857.2 


481468H1 


2063 


2250 


8 


436857.2 


56831 78H1 


1 


212 


8 


436857.2 


1477850H1 


56 


278 


8 


436857.2 


1477850F6 


56 


488 


8 


436857.2 


4619212H1 


138 


290 


8 


436857.2 


g 1978924 


207 


556 


8 


436857.2 


3758509H1 


206 


498 


8 


436857.2 


4617960H1 


323 


550 


8 


436857.2 


47 18961 HI 


323 


555 


8 


436857.2 


4617960F6 


323 


761 


8 


436857.2 


1992924H1 


366 


651 


8 


436857.2 


g4269060 


528 


627 


8 


436857.2 


4255690H1 


599 


830 


8 


436857.2 


4761770H1 


727 


1004 


8 


436857.2 


4613106H1 


795 


1034 


8 


436857.2 


g2000739 


795 


1025 


8 


436857.2 


2704880H1 


883 


1176 


8 


436857.2 


2704880F6 


883 


1313 


8 


436857.2 


2707669H1 


961 


1264 


8 


436857.2 


805609H1 


1054 


1283 


8 


436857.2 


805609R1 


1054 


1630 


8 


436857.2 


4135963H1 


1113 


1415 


8 


436857.2 


4294249H1 


1152 


1398 


8 


436857.2 


5450335H1 


1156 


1421 


8 


436857.2 


5373082H1 


1203 


1419 


8 


436857.2 


4190554H1 


1247 


1412 



82 
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CLAIMS 

What is claimed is: 

5 1. An isolated polynucleotide comprising a polynucleotide sequence selected from the group 

consisting of: 

a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14, 

b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14, 

10 c) a polynucleotide sequence complementary to a), 

d) a polynucleotide sequence complementary to b), and 

e) an RNA equivalent of a) through d). 

2. An isolated polynucleotide of claim i, comprising a polynucleotide sequence selected 
15 from the group consisting of SEQ ID NO: 1-14. 

3. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a 
polynucleotide of claim 1 . 

20 4. A composition for the detection of expression of disease detection and treatment molecule 

polynucleotides comprising at least one of the polynucleotides of claim 1 and a detectable label. 

5. A method for detecting a target polynucleotide in a sample, said target polynucleotide 
having a sequence of a polynucleotide of claim 1, the method comprising: 
25 a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction 

amplification, and 

b) detecting the presence or absence of said amplified target polynucleotide or fragment 
thereof, and, optionally, if present, the amount thereof. 

30 6. A method for detecting a target polynucleotide in a sample, said target polynucleotide 

comprising a sequence of a polynucleotide of claim 1, the method comprising: 

a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides 
comprising a sequence complementary to said target polynucleotide in the sample, and which probe 
specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization 

35 complex is formed between said probe and said target polynucleotide, and 
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b) detecting the presence or absence of said hybridization complex, and, optionally, if 
present, the amount thereof. 

7. A method of claim 5, wherein the probe comprises at least 30 contiguous nucleotides. 

5 

8. A method of claim 5, wherein the probe comprises at least 60 contiguous nucleotides. 

9. A recombinant polynucleotide comprising a promoter sequence operably linked to a 
polynucleotide of claim 1 . 

10 

10. A cell transformed with a recombinant polynucleotide of claim 9. 

1 1. A transgenic organism comprising a recombinant polynucleotide of claim 9. 

15 12. A method for producing a disease detection and treatment molecule polypeptide, the 

method comprising: 

a) culturing a cell under conditions suitable for expression of the disease detection and 
treatment molecule polypeptide, wherein said cell is transformed with a recombinant polynucleotide 
of claim 9, and 

20 b) recovering the disease detection and treatment molecule polypeptide so expressed. 

13. A purified disease detection and treatment molecule polypeptide (MDDT) encoded by at 
least one of the polynucleotides of claim 2. 

25 14. An isolated antibody which specifically binds to a disease detection and treatment 

molecule polypeptide of claim 13. 

15. A method of identifying a test compound which specifically binds to the disease 
detection and treatment molecule polypeptide of claim 13, the method comprising the steps of: 
30 a) providing a test compound; 

b) combining the disease detection and treatment molecule polypeptide with the test 
compound for a sufficient time and under suitable conditions for binding; and 

c) detecting binding of the disease detection and treatment molecule polypeptide to the 
test compound, thereby identifying the test compound which specifically binds the disease detection 

3 5 and treatment molecule polypeptide. 
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16. A microarray wherein at least one element of the microarray is a polynucleotide of claim 

3. 

17. A method for generating a transcript image of a sample which contains polynucleotides, 
the method comprising the steps of: 

a) labeling the polynucleotides of the sample, 

b) contacting the elements of the microarray of claim 16 with the labeled polynucleotides of 
the sample under conditions suitable for the formation of a hybridization complex, and 

c) quantifying the expression of the polynucleotides in the sample. 

1 8. A method for screening a compound for effectiveness in altering expression of a target 
polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence of claim 1, 
the method comprising: 

a) exposing a sample comprising the target polynucleotide to a compound, and 

b) detecting altered expression of the target polynucleotide. 

19. A method of claim 6 for toxicity testing of a compound, further comprising 

(c) comparing the presence, absence or amount of said target polynucleotide in a first 
biological sample and a second biological sample, wherein said first biological sample has been 
contacted with said compound, and said second sample is a control, whereby a change in presence, 
absence or amount of said target polynucleotide in said first sample, as compared with said second 
sample, is indicative of toxic response to said compound. 
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SEQUENCE LISTING 



<110> INCYTE GENOMICS , INC. 
Hodgson, David M. 
Lincoln, Stephen E. 
Russo, Frank D. 
Spiro, Peter A. 
Banville, Steven C. 
Bratcher, Shawn R. 
Dufour, Gerard E. 
Cohen, Howard J. 
Rosen, Bruce H. 
Chalup, Michael S. 
Hillman, Jennifer L . 
Jones, Anissa L. 
Yu, Jimmy Y. 
Greenawalt, Lila B. 
Panzer, Scott R. 
Roseberry, Ann M. 
Wright, Rachel J. 
Daniels, Susan 2. 



<120> MOLECULES FOR DISEASE DETECTION AND TREATMENT 



<130> PT-1042 PCT 

<140> To Be Assigned 

<141> Herewith 

<150> US 60/137,412; 60/147,500; 60/147,501; 60/147,542 

<151> 1999-06-03; 1999-08-05; 1999-08-05; 1999-08-05 



<160> 14 

<170> PERL Program 

<210> 1 
<211> 3101 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_feature 

<223> Incyte ID No: 222197.6 

<220> 

<221> unsure 

<222> 3077, 3084, 3093, 3097-3098 
<223> a, t, c, g, or other 

<400> 1 

agtgattgca tgagcttagg gaggggagtg acatgatctg atttacgtct gtggaagacc. 60 
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actctgggtg ctgcatgggg gactggactg 
aggactggtc agcaggtgat ctaagcactc 
ccttaagatc ctgaagaaac ggcacaaaat 
gggtgcgtca gggaaatcat gcagccatca 
cctctcctgg ctgaaaatga caactatgac 
gtggctgacc gggtctggtt catccgtgac 
tggcttctgg tcgcctatgc agacttcgtg 
gacttctggt actctgtggt caacggggtc 
tcatcccacc tgagaaccat gctcaccgac 
aaagaataca tggagagctt gcagctgaag 
tgctgctgta ttaaacccga gcgcgcccac 
aaaatggatc atcactgccc gtgggtgaac 
tttgtgctct tcactatgta tatagctctg 
tttcagttca tctcctgtgt ccgagggcag 
ataactgtaa tcctgttgat cttcctgtgc 
gcagttatgt ttggcaccca aatccactcc 
ttgaaaagtg agaagcccac atgggagcgg 
tttggggggc ccccctcact cctctggatg 
ctgcccacga gacccagaaa aggtggcccg 
actgaaactt gctcacagac ttccagttat 
catctgtgac caacagggca actggaacct 
ttttatatat ttatagtcac agatggcaga 
caacggaaag gtgtgtggcc acacgaagaa 
ggcttctgtg gagaatactt cgggttatta 
tgctgtttcc aatcatgaag aaaaacagtg 
catttcaggg ggctcctgct gaccccgcca 
ggcgtttaca tagaaagacg ttttggtctc 
caaaagatct gtgcactgaa cagtgaaggt 
agagacatcc tttgaccctc tcagcaagtc 
cgtgtgtgca tgtgtgtcaa aattgccagt 
gtgtacagca aacaagctat tttttagaaa 
gcggggtcct gcccgtggtt actatgaatg 
gaacagccgt tcctgtgcgg cccttcgttg 
tgagtgcttg agggccttgg aactgatttt 
tataaatggc acctaggtaa gagcagagct 
cccgatgctg tgtgtggggc aggggaggca 
gggcacctct gggaagggga ggggaccatg 
gccaccgcag gcagcgctcc agtccgggaa 
tttggttagc ttttacgttt tcttctccac 
ggtgcggact ttaaaattat gccagaaagc 
aacttgcctg gtttgtacat tttttgccgg 
tgagggtctt cctttatgct tgccctccac 
aattgttttg cctttctgtt catctgtgaa 
catccgaatc agggcttcta ccactgctga 
accgtcctag ccaagcgagc aaacctgcag 
aagcgtgtgc gccgttgggg gaagagctgc 
tcctggagac gccagtttcc gagattgttc 
ttggacatgc gtgtgggctt cagtgtgagg 
aacaattatc caagtggttg aatcctgtga 
cttgactttt caacctcttc tttcaatgta 
cagttctcaa aaaaaanaaa gggnggccgc 

<210> 2 

<211> 2561 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_f eature 

<223> Incyte ID No: 227709.3 



ttgggtgagc agaggcatga gagtagagag 120 
cccagatccg atcacatagg acagtatgca 180 
gttcaagtga tgtttagaaa taacttgtga 240 
ggacacaggc tccgggacgt cgagcatcat 3 00 
tcttcatcgt cctcctcctc cgaggctgac 360 
ggctgcggca tgatctgtgc tgtcatgacg 420 
gtgactttcg tcatgccgct gccttccaaa 480 
atctttaact gcttggccgt gcttgccctg 540 
cctggggcag tacccaaagg aaacgctacg 600 
cccggggaag tcatctacaa gtgccccaag 660 
cactgcagta tttgcaaaag atgtattcgg 720 
aattgtgtag gagaaaagaa tcaaagattt 780 
tcttcagtcc atgctctgat cctttgtgga 840 
tggactgaat gcagtgattt ttcacctccg 900 
cttgagggtc ttctgttttt cactttcact 960 
atatgcaacg acgagacgga gatcgagcga 1020 
aggctgcgat gggaagggat gaagtccgtc 1080 
aatccctttg tgggcttccg atttaggcga 1140 
gagttctcag tgtgaggcgt ggctcatcag 1200 
ttatttgggg tctgaaggat atcaacagct 1260 
acacaaacca attgcttgca gcaagcagag 132 0 
ggaagaggct ctcagtcccc acctgtacaa 13 8 0 
gccaaacgcc gtggcctcct gcagagctgg 1440 
catgggttat tcaaatcctg ggtcctgagc 1500 
aatccagtga acagggattc tccaagcagt 1560 
ctcagcagtg cactccccgg atcacagcag 1620 
gattagctcc gatgctttgc gctgaagttg 1680 
ggcttccggc acactccccg ctgccccgga 174 0 
tgtgtgtgtg cgtgtctgtg cgtgtgcgcg 1800 
gttgtttagg caatgtaaca tttaccggct 1860 
ccgacgtttc agggaagagg ggagagagcc 1920 
tattgctgtt ggaggacatc tcgatccaaa 1980 
ccctcctgct ttcatttttt aaagaaatct 2040 
tttttttttg ttccagccaa attagcagtg 2100 
gcggctcggt gacttgatac ttggggcagc 2160 
tccttactgg agaggcaggg cccagccatt 2220 
aggcagccag cccctggcag gggcgactgt 2280 
tggccaggat ggcgccctct tgttggagtt 2340 
ccacggcaca ggtgataaaa taggatcctt 2400 
caacagctcc cctcgtgggg ccttgcctta 2460 
acgcatcaag aagcaatctg tgacaaagtc 2520 
actaagagaa gttggcgtct ccctcctggg 2580 
ctgttttttg tttttaatta ctctgtaccc 2640 
tgcaaaacca caaagggacc tacctgagcc 2700 
ggggtttgga agtggacttg gtcaccgcag 2760 
gtcacagcca gagggacaaa gtgtgggtga 2820 
tgcatattca tttgcacatt gttgtctggg 2880 
cttttaatat gtatatcctg ttatcaataa 2940 
gacttggcaa gtgtgtgcaa atcaagtata 3000 
acttttatat gaaataaagt aatcaattaa 3 060 
cgnctannga g 3101 
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<220> 

<221> unsure 

<222> 126, 2144 

<223> a, t, c, g, or other 

<400> 2 

gcgggcgcgt cgccctctgc ccccgccggc 
gacgtggatc ggtaccaggc tgtcctggcc 
tgtgcngatt gccagtctaa agggccgcga 
tgcattcgat gtgctggaat ccacaggaat 
gttaacctcg accagtggac tcaagaacag 
aaggcaaacc gactttatga agcctatctt 
ccagctgttg aaggatttat tcgagacaaa 
ctggacatca atgcctttag gaaagaaaaa 
gttccagaaa aaaaattgga acctgttgtt 
gaagacccac agctacctcg gaaaagctcc 
ttgggccttg atgctcctgt ggcctgctcc 
gagaaggatt tagatctgtt ggcctctgtt 
gttgtaggtt ccatgccaac tgcagggagt 
tttccggagc cagggagcaa atcagaagaa 
attctttcac tgtatggatc ccagacgcct 
cccgctcaga tggcatatcc cacagcctac 
agcataatgg ggagcatgat gcctccacca 
gggatggttg cccccatggc catgcctgca 
atgggtgtgc cgaatggaat gatgaccacc 
gctatgcccc agactgtgta tggggtccag 
cagatgaccc agcagatggc tgggatgaac 
ggacagtcaa tgagtggcgg aaatggacag 
tggaaataaa aacaaaacac ctgtatggct 
ttccacagcc tccacccctg acccccatcc 
attgctcaat aagtcatttg gggtttggca 
cctctctgtt gctttatgtt gtacatgccc 
ctcctggcac cagcacctta gaagttgttg 
tgcacacctt tgagtccctt ccctcaaggt 
tgtgggtgtt gtatattagg caaacagggg 
taagctgttt ctaagtgttt aaatttgaaa 
aagcaagtac tgaaatcaaa ttaaatactc 
ctggggtgag gcaagccccc tcctatgagg 
agttgctttc tggatctggg gcttcaggac 
aaagacttta tgaagatccc acacacagac 
cagtaggatc tggctccgtg gctggaggac 
cgtgtactgc ttgtgtgtgt gcgtgaagtg 
tcccaccctc tccccatctg ctctgggtat 
cagagaggaa ttaatttatc agcagcctaa 
acgccatgtc attgataact ccctttctcc 
ttcatgcctg tgtatccagg gtgctctgtt 
gggccgggac agctttcctc tcagtcattg 
actttgctta aaagatttca tgtgtgggaa 
atgtgtaaat tccttaataa atattgcagg 



accctggcca tgacaggcaa gtcggtgaag 60 
aacctgctgc tggaggagga taacaagttt 120 
tgggcctctt ggaacattgg tgtgttcatc 180 
ctgggggtgc acatatccag ggtaaagtca 240 
attcagtgca tgcaagagat gggaaatgga 300 
cctgagacct ttcggcgacc tcagatagac 3 60 
tatgagaaga agaaatacat ggaccgaagt 42 0 
gatgacaagt ggaaaagagg gagcgaacca 480 
tttgagaagg tgaaaatgcc acagaaaaaa 540 
ccgaaatcca cagcgcctgt catggatttg 600 
attgcaaata gtaagaccag caatacccta 660 
ccatcccctt cttcttcggg ttccagaaag 720 
gccggctctg ttcctgaaaa tctgaacctg 780 
ataggcaaga aacagctctc taaagactcc 840 
caaatgccta ctcaagcaat gttcatggct 900 
cccagcttcc ccggggttac acctcctaac 960 
gtaggcatgg ttgctcagcc aggagcttct 1020 
ggctatatgg gtggcatgca ggcatcaatg 1080 
cagcaggctg gctacatggc aggcatggca 1140 
ccagctcagc agctgcaatg gaaccttact 1200 
ttctatggag ccaatggcat gatgaactat 1260 
gcagcaaatc agactctcag tcctcagatg 1320 
gccattctct tcagccctgc gctctcccct 1380 
tcttttccta cctctctgtt tggtttagaa 1440 
tcctgcccag ccacttccca aacatgaaga 1500 
catagccatc ccaacgtcct ccccagtcct 1560 
gcagaaggca cttaaactgt gggagaagtg 1620 
taaagctcct gtcagactct cagaagggtc 1680 
aaagcttaga ggtccttcta tatgtgttaa 1740 
agcatcatgt tctcatgatt tatgggaatg 1800 
cctgggtcct gggtcagttt gaccctagcc 1860 
atgagcaaaa atactactct cttcgccctg 1920 
ttgctgcttc agtcagcctt tattagcacc 1980 
acacatccct tcccgcctcc cccctgcctt 2040 
caacccctat agtgggaatg cagagcttaa 2100 
tgtgtgtgtg taanaagtgt gtgttccgcc 2160 
ttttgttttt gtttagtttt aggtttacaa 2220 
aactgttgtg tttttcttat ggtttaaaaa 2280 
cttcccttct cccggtctgc tgatcactct 2340 
tccccaccgt tcccaggtgt acgaggcaga 2400 
ttcaccccac ttgaaaattc agacaagaaa 2460 
ccacagttcc tggctgcctt tctcctgtgt 2520 
gaaggactgt t 2561 



<210> 3 

<211> 2710 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_f eature 

<223> Incyte ID No: 237703.2 



<220> 

<221> unsure 
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<222> 712, 799, 2332, 2334, 2342, 
<223> a, t, c, g, or other 

<400> 3 

caggtacctt attacattat tttatttaat 
tcttttatag agtaagaaat tgagattcag 
ggaatgaaca tcaaaagcct attccctctg 
ttccccatgt ctgtgttagt aaactaataa 
ttaacccaag agcatgccac atttaaccag 
ctaaaaggtt gtttctaagg ttgtattgag 
ccttattaat actttattct tctgttaaaa 
tatattggtg tgataatcct acagaaaaat 
cataaaatat taaacagaaa gcctatgtta 
ccttagttgg accaggagta aatgtatggt 
caacaaagtg tagttaacac taacttgttc 
aaaagagatc ttaaattatt ttcatttaaa 
acttaattaa atctaacacc acaagacaat 
gagttgaaag atttctttnt tttcttctca 
aaaggccact cacgtgaaca ctagggatga 
ctacagtgga cacttagata ttgttcagga 
agtgactgtg gatggctgga cgcccctgca 
ggcttctttc ttactgcagc atgatgcaga 
ccccttgcat cttgctgctg ggaacagaga 
gaaccgttac gtcaaaccag ggctgaaaaa 
caggaggaca agtatctatc actacctctt 
acctcagtct taacaattct agtaattttc 
tgtgagatgt attcccataa tcaaagttga 
acattcatta taacattctt ccaagtgaat 
aagtaatttg catatatctt taattatttc 
aattttaatg tgtgtatact taaaaacttg 
gtgctgatac aagagaaatg tatttttaaa 
ttagtatatt gacatatatt tttataaggt 
taaatattct gatacaattc agctgtcttc 
ttaaaccaga gcaagtaaca tattagtgac 
gcaaaaaacc tagatctttg tcttttagaa 
aagtgtttaa ttcatgaata ttgtatactg 
ggatagtcct acctcaccct ggtcaaccta 
tgaccttgta gacatgcaca caactatacc 
tccaactggt tttacctgcc taatcctact 
gttcacagga aatgttgatt ttctaaggtc 
acaggggaaa aggaattagt ctaagagtaa 
acaaagtggc tcaggcttta aaaaaaaaac 
aaatatttga gaaaaataaa gtacaagttt 
tnacctgtgt atgtctttct tggatccaga 
ttaacatcta ctcactcagt ctctccagca 
tgtgattgtn cccaaagttt tgtcttctca 
aaccagttgc tctttacttc agttaaattt 
aagccatcct acgtaaccca gtcaccctag 
taaatattta tttttgctta gaacttatta 
ttgggcctgg 

<210> 4 

<211> 2059 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_f eature 

<223> Incyte ID No: 240091.1 

<220> 



2470, 2611, 2682 



catcatgtat ccctataagt aatattcttc 60 
gaatattaat ttgcccagga tcaccgaagt 120 
cttggccact tccacctcat tttactaagt 180 
ctaaaagggt ctcgcatttt aaatagcttt 240 
aggcccatag aacaaactga aaattacaac 300 
aaggaattga gctcttgaat ccctagaatt 3 60 
gttttatttt taaaagtttc atacagtgtg 420 
caagcagtta tgttttcttc acagataaca 480 
ttcattggac tgaagctttt atgcaataaa 540 
ttgatattca gagaatctca ttcttagaag 600 
attcttaaat cagtagtcct ctcctcccca 660 
gtcatctact aacaagtaag tntttattca 720 
tttgttttag ttattgtttt ggtttgagtt 780 
gcttaccaca gtgcggagac tgctttctga 840 
agatgagtat acccctcttc atcgagcagc 900 
gctcattgca cagggggccg atgttcatgc 9 60 
cagtgcttgt aagtggaata atcccagagt 1020 
tatcaatgcc caaacaaaag gcctcttgac 1080 
cagcaaggat accctagaac tcctcctgat 1140 
caacttggaa gaaactgcat ttgatattgc 12 00 
tgaaattgtg gaaggctgta caaattcttc 1260 
ctaagtttct aaataccagt gcctcctgtg 1320 
cgtcaaacat cttactacaa aaattcagtg 13 80 
tgcccgactt tgatgtcaaa atgtatttga 1440 
tgtggagttt gtgatttttt tatcagaaat 1500 
acacgggttg tacagaaact ggtatttttg 1560 
tatcccacat cctggatctt tgttgggtat 1620 
gaggtaactc agaacttaat ttaaaagtct 1680 
tctaccttac catagccagt tgctttcatt 1740 
ttgaatcttc ataagttaaa gtaaaaaaca 1800 
cacagaccat tttcaggaaa gcagttagct 1860 
catcccctac cacaatttac acaatcctgt 1920 
catgatcctt aagctaatgg cgaatcacga 19 80 
tttgtccaac agatcataat atatctgcta 2040 
gatttgggca ctgcttgtat agtctctcaa 2100 
ctcattttta cagagtatac aggcaaagtg 2160 
ggggatgatt attatattga ggctaaaacc 2220 
actgtggata atgacaaaaa gcataagtaa 2280 
tgaacaacac aaaaggcatg antncatttt 2340 
acattattca tccagcacgc acttagttat 2400 
gcaatttttg cattgtctat ctagcccctt 2460 
acaccacaac actccagggg aagggaacta 2520 
ttaagatgtc caccaaggct tatctctttc 2580 
nctaagtaat aatgttattt aatcaaaggt 2640 
gatcatctca gnaaaagtca gaggtaatat 2700 

2710 
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<221> unsure 
<222> 1850 

<223> a, t, c, g, or other 



<400> 4 

cgcgctccgg acctggcagg cggcggctgc 
gggacgcagg gagcgaccag aggcagaatg 
gcgaggagtg gtgtgtcatt gatgactgtg 
atatagatga ccccaaatgg acactttgct 
gtacagctcc acctatctac cagttgaatg 
atttatcaaa tagccttgag gaaatatata 
tgtgggtgga gaaaataaga gatgttctta 
cagatgtaaa gaagaaaact gaagaggaag 
catgtcagcc ggaaagttcg gttaaagcat 
aagtagaagt agaagaatta cctccgattg 
gtacttttca ggcacacttg gctccagtgg 
ccaaattgta tgagaataag aaaatagcta 
tatattgtga ggataaacag accttcttac 
ctggtgggcg tcttcttcat ctcatggaga 
tatcacgctg gtatggaggg attctgctag 
gtgccagaaa catactagtg gaaaagaact 
ctttgggaaa gaacaaaaaa gtaagaaaag 
aactatagga aaggttaatt tgcctataat 
atattgtgca gagagagtat ccttgactgc 
cattagcttt tcttcttggt tatatcatct 
tgtgtgtttc aggcttattt gggagaacta 
ttagcaaggt aacagttgcc cagggcagta 
caagtgcctt tgttaggtgg agaagaaatg 
tgtcatcgag attcttgtac tgttaaatga 
ttggaatagg agctcattta agattgatct 
taagtatgtc acctttcatt ttatagtgtt 
caggagtatc catctgcagt tatgtgctga 
ataaatatta tgcttcagtt tctgttgcaa 
tttctagtct gctttttttg tttaattctt 
tagtctcctt ccaccccaga aatgtgttgg 
ggaagtacct ttcttgtgat cttcactgag 
ggcaaatggg acattcgtag agtgggatag 
gagtatgttg tgtagtacat caatttgatg 
tctcagttcc aagattttgc agagagaagg 
aaccctcccc cccattttt 



agggcaggtc caggggccac atggctgagg 60 
aggaaattga agcaatggca gccatttatg 12 0 
ccaaaatatt ttgtattaga attagcgacg 180 
tgcaggtgat gctgccgaat gaatacccag 240 
ctccttggct taaagggcaa gaacgtgcgg 300 
ttcagaatat cggtgaaagt attctttacc 360 
tacaaaaatc tcagatgaca gaaccaggcc 420 
atgttgaatg tgaagatgat ctcattttag 480 
tggattttga tatcagtgaa actcggacag 540 
atcatggcat tcctattaca gaccgaagaa 600 
tttgtcccaa acaggtgaaa atggttcttt 660 
gtgccaccca caacatctat gcctacagaa 720 
aggattgtga ggatgatggg gaaacagcag 780 
ttttgaatgt gaagaatgtc a'tggtggtag 840 
gaccagatcg ctttaaacat atcaacaact 900 
acacaaattc acctgaggag tcatctaagg 960 
acaagaagag gaatgaacat taatacctga 1020 
tatatataca ttccatagtc atcaaggaat 1080 
ttaagtcagc cagttcagca tggataccaa 1140 
gccaaaaata gagaacttat gatctattca 12 00 
atttgaactt aatcaccact tcatctaatt 12 60 
cctgaattaa ctgtccattt cagtacatgt 1320 
tctctagagg aatataaata cctgatttct 13 80 
atattgcctt ttactgctct ttatggctta 1440 
tggagagttt cttcttgtga ttttagttca 1500 
catcattgag taatggatta agtgaaaatc 1560 
ggtgataatt catccaacat atttgttagc 1620 
attggtgatt gtgaaattac agaaagtgat 1680 
gtaatgtaag caataaatat ggagtgtcag 1740 
tgtaacattc tcgtttcttt taacaacctg 1800 
gaattagaac tatgatagan gttaggctgt 1860 
aggtggcaga atgaacctgg tgtagggcag 1920 
catgctttcc atctgcactc cagacggctt 1980 
agcaaacctt ttcattggaa aaacagaaac 2040 

2059 



<210> 5 

<211> 3705 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 243096.6 

<220> 

<221> unsure 

<222> 13-14, 2121 

<223> a, t, c, g, or other 



<400> 5 

cgcagtgccg gannccgcag cgccggaacc 
gtcagctgcg ggagcgtttc cggggacggt 
tgcagcgccg cccaggccgc ctggcgggag 
cgctggttcc cgggccacat ggccaagggg 
gtggactgta tcatcgaggt ccacgatgcc 



tcagaggcgg gtcgcagcgg cgcagaggag 60 
gccgccatga gattgacccc gcgcgcgctg 120 
aacttccccc tgtgcggtcg cgacgtggcg 180 
ctgaagaaga tgcagagcag cctgaagctg 240 
cggatcccac tttcaggccg caaccctctg 300 
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tttcaggaaa cccttgggct taagcctcac ttgctggtcc tcaacaagat ggacttggcg 3 60 
gatcttacag agcagcagaa aattatgcaa cacttagaag gagaaggcct aaaaaatgtc 420 
atttttacca actgtgtaaa ggatgaaaat gtcaagcaga tcatcccgat ggtcactgaa 480 
ctgattggga gaagccaccg ctaccaccga aaagagaacc tggagtactg tatcatggtc 540 
attggggtcc ccaacgtggg caagtcctcc ctcatcaact ccctccggag gcagcacctc 600 
aggaaaggga aagccaccag ggtgggtggc gagcctggga tcaccagagc tgtgatgt cc 660 
aaaattcagg tctctgagcg gcccctgatg ttcctgttgg acactcctgg cgtgctggct 720 
cctcggattg aaagtgtgga gacaggcctg aagctggccc tgtgtggaac ggtgctggac 7 80 
cacctggtcg gggaggagac catggctgac tacctgctgt acaccctcaa caaacaccag 840 
cgctttgggt acgtgcagca ctacggcctg ggcagtgcct gtgacaacgt agagcgcgtg 900 
ctgaagagtg tggctgtgaa gctggggaag acgcagaagg tgaaggtgct cacgggcacg 960 
ggtaacgtga acgttattca gcctaactat cctgcggcag cccgtgactt cc tgcagact 1020 
ttccgccgtg ggctgctggg ttccgtgatg ctggacctcg acgtcctgcg gggccacccc 1080 
ccggctgaga ctttgccctg aacttgtccg ggtagggagg gccggaggca tgtggcctcc 1140 
cagacctcct gacctgggtg gttgaggctc aagacagctc acccggtcca gaagctccat 1200 
gctggtcact agggtgctgt gctctctggc gccccacagc ctggccagct ccagggaccc 1260 
cagttgcagg gcccaagcag gtgggagtgg acaccaggct tcccagtgga cgtccctgag 1320 
cagctccgca tgcttggttc tcccggagct tcctgctcag gcctcttgag aaatggatgc 13 80 
tgtctcagaa ggagttaaag ctataacctg taacctttaa aatctccagt taaagggcct 1440 
gtttcttact ggcctgtgag gtgcaccgta gtgccttggg cctgtgtgtt aaagctgctc 1500 
tcaccagtgg aacctaagaa atgagcaggt tggcagctag ggtttgtgtt ggaggctttc 1560 
ggtccagtgt cttgcagtcc tacaacaagt gagaggcttg ctgccatcag agaggtttat 1620 
ttcacactta caggcacaca cagacacaga ccagagactc ccagcagcag agcccaagca 1680 
ctggcttcgc ccctcagtgc cctggggcat gttcagggca gggttgaggg ggacgccctg 1740 
cacatggctt tgctgtgcaa tgactggaag gccgcccggc atgggcagta gagacccctg 1800 
gcctctgagc accttctagc tcactggtag tgggattctg cattagtggg gctgagagat 1860 
gtgggggccc ctccagcccc attatagtgc acctgaaggg gtccacagcc tgtgtcctag 1920 
aagagggaag aggaaggaag gtgggtgggg ctggtagtat ggactaaggt gctgcaggac 1980 
ctggggccag ggacatcctg tgcagaagct ccggctgcct ctttgcggtg gtggcctgac 2040 
cgtcccacag cagcctccac cagggccctg gtgctcagtg gcccctcttt gctggctggc 2100 
tgcctctgct gccccatacc ncacacactc atcagcctga agttagcccc tgagtgccac 2160 
ctgcatcgtg ccataaccct gaccgcctgg ggcaggaagt attcaggttg gctgtgtcag 2220 
atgctaatgt gctgaatcaa cagtcattgc agatcacgaa gtgtccatca taactggaac 2280 
attccatcag cttgcagtgc tgtggtgtgt gagggtctgg tgcagctcag cccattttcc 23 40 
aggtgggcat ctgcaaagtt gagggggctc cggtgggtct ctctgctgtg aggagactca 2 400 
gaccaccccc tgcctcctgg gggaaatgtc agaagggctt ctctgcctat gaggatctgg 2460 
ggcagggctt ggccttggcc tgctggtctt ggaggcgttg agcttggtct ggaaggggtg 2520 
gaggagcgtc tgggctcact gggccagggg cattgctggc agtgtggacg ggaggctgca 2580 
gggcgctgcc tcctgtggct tagtgccctg gagctagaga gcagtgcttg gttgagtcct 2640 
gccaacagct tccagatcct cacccaggcc agaacccagg ccagctgggg aaggcagagg 2700 
ctggcagggc ccgtggtggg tgctggtctt gactttggtg tccactgagt cccgaggctc 2760 
aggcccagga gggatgcagt ccggctgagg gcgaggctgt caccaggaca tggagagggt 2 82 0 
gagatcccaa ggccacgggg gggggggcag ggagaacccc tcctaccctg gatgagtggg 2880 
tgactggaga gctagagaac gtggcagacc caagacctct cagtgctgag cccatggagg 2940 
atgccccagg ctggcgggac tgggaagcag agggctggtc ttaacacagg tgtgtccagt 3 000 
gctggaggca agtccttgtc gtgactgtcc agcgccactc catgtctctc ctgtccttgg 3060 
atgttggggg gctcagcctc ttgcatgggt gtcctgctgg gcgctgggcc ccgccactgg 3120 
ccccctgctt gctttggggt ctgagttagc tcctggctcc actgagcagg ccgtcagctg 3180 
ccagcccacc acgcggatac ccaggccctg ttccgaggcc tggaacagct gcttccgaag 3240 
aaggggctgc cttcagggaa atgcgtgcac cgtgcagcct gtgctgtgcc cagggaggcc 3300 
tcttcagcgg gattggcagt tgctgtgccc tgagaacagg cagaactgtg tgatccctga 33 60 
atgtgaacct gaagttcaaa ggacttggaa agctctggaa tgtgttggtt tttccccccc 3420 
aaaatgggtc ctaaggaggg taaagtgact tgtttcaagt tgttggagca aagtgggtct 3480 
ctcacggatc tcggcctgag ggtgtggggg agaaggcctg gacagcccct cagggcaggg 3 540 
tgtgttttcc caccagccgc agagagccag gatggacgtt cctcggacgg acggttttcc 3600 
tgcttgggaa tgttcctggg ctgtgagatc cactcttctg ggcaggtggt tagcacctaa 3660 
cgtttttccc tcacttcccc ccaaattctt aagtcctttg gtcca 3705 

<210> 6 
<211> 3644 
<212> DNA 
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PT-1042 PCT 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 244366.6 

<400> 6 

gttttccacc tgcaaccatt tgcatgtgta 
tgtacaagtt gtgtttctta atcttccctt 
catttggaat cacctttccc cctcccatgt 
tggctttatt tgggaggggg aagggtgata 
actcctctct gttgcttccc tcttcccatt 
gggctatgcg gcagggcaga tttcccatca 
gagattcaaa cccagcaagt atgtcccggt 
tacgacactc ttctttgcct ttacgtgtcc 
gcccatctac aatgcaatta tgtttctctt 
catggaccca gggattttcc ctcgagctga 
agctcccctt tacaaaacag tggagataaa 
cacctgccgc ttttaccgtc cccctcgatg 
ggaggaattt gatcatcact gcccctgggt 
ttattttttc cttttcctcc tttccctgac 
cctcctttat gtcctctacc acatagagga 
ggcagtaatg tgtgtggctg gcttattctt 
cgtggttctg gtggccaggg gacgcacaac 
aggtgtgaac cccttcacca atggctgctg 
tccagcaccc aggtatttgg ggagaccaaa 
cttccttcga ccagaagttt cagatgggca 
ccagggagag ctgaggagaa caaagtctaa 
tgcagatgct gaacctccac ctcctcctaa 
aacacacctc ggcctggcta ctaatgagga 
gacacctacc atgtacaagt atcggccggg 
tgccgcattc ctccagcgcc aagttgagtc 
ttgcagagag cagccgtcac cccagctacc 
tccgttctcc tacctttggc aaaagttttc 
cctccagcct caagtcagcc cagggcacag 
gttcagaggg caccacctcc acctcctata 
gcctatctta tgacagcttg ctcacacctt 
cagggcctga gccagaccca cctttaggct 
cccagcaacg ggaagctgag aggcacccac 
agccctcacc agtccgttac gacaatctgt 
gagagaagtt gctgcgccag tcacccccac 
gggactcagg cattcagtcg acaccaggct 
cagatgattc aaagagatca cctttgggca 
gttttggcaa gccagatggg ctaaggggcc 
acagccccat acctgggccg atcgatgtct 
tctgagacag aagaagtggc cttgcagcca 
aagaccacct acagcaaatc caacgggcag 
ccaggccagc cacctctcag tagccccacg 
ggtggtacca cctatgagat ttcggtgtga 
tgcgcctaca ccaaagggcc ccaggtggcc 
gtgcatggac attttttaaa ccaccgattc 
agtaggcttg gggagtcgga gagttggggc 
atcttttaag accttccctt ccttgatccc 
tgctcgccct ggagggaacc agatcatttt 
tacggattct atttttttcc tcttctgcgt 
tatatatata ttataaatat caaagaaatt 
gagggataca tatacggagg gggatcttac 
aggggagacg tcagtctttt tcctgtggtt 
ggaaatagca tcctctgctg gtgctaagtg 
tctggaattt tggtcacaag agggaaggac 
ttggcattta gtttccctag gaaaggggtc 



cagcctactg tttgtctcca gtttttaaac 60 
ctgccftgtt ctggggaggt ggttattcat 120 
gctttccttc atttgagatc ttttgacctt 180 
aagttttctg tttccctggt tttcttttgt 240 
ttcttgtctg ttctgccgct gtgtgggcct 3 00 
gagctccaac atgcccgcag agtctggaaa 3 60 
ctctgcagcc gccatcttcc tagtgggagc 420 
aggactaagc ctgtatgtgt cacctgcagt 4 80 
tgtgttggcc aacttcagca tggccacctt 540 
ggaggatgag gacaaggaag atgatttccg 600 
gggcatccag gtgcgcatga aatggtgtgc 660 
ttcccactgc agtgtctgtg acaactgtgt 720 
gaataactgt attggtcgcc ggaactaccg 780 
agcccacatt atgggtgtgt ttggctttgg 840 
actctcaggg gtccgcacgg ctgtcacaat 900 
catccctgta gctggcctca cgggatttca 960 
caatgaacag gttacgggta aattccgggg 1020 
taacaatgtc agccgtgttc tctgcagttc 1080 
gaaagagaag acaattgtaa tcagacctcc 1140 
gataactgtg aagatcatgg ataatggcat 1200 
gggaagcctg gagataacag agagccagtc 12 60 
gccagacctg agccgttaca cagggttgcg 1320 
tagtagctta ttggccaagg acagcccccc 13 80 
ttacagtagc agcagtaccg tcagctgcca 144 0 
gtggggacag cttgaaggag ccaacctcaa 1500 
gctcagagcc cagcttggaa ccagagagct 1560 
acttcgatcc actatccagt ggctcacgct 1620 
gctttgagct gggccagttg caatccattc 1680 
agagcctggc caaccagaca cgcaatggaa 1740 
cagacagccc tgattttgag tcagtgcagg 1800 
atacctctcc cttcctgtca gccaggctgg 1860 
gtttggtgcc aactggccca acacaccgag 1920 
cgcgccacat tgtggcctct ctccaggaac 19 80 
tcccgggccg tgaggaagaa ccaggcttgg 2040 
cgggccatgc ccctcgtact agttcctcct 2100 
agactccact gggacgccca gctgtccccc 2160 
ggggagtagg gtcccctgaa ctcaggccca 2220 
tacagcagcc aaaaagccca acctggtgtc 2280 
ttactgacac ccaaagatga agtacagctg 2340 
cccaagagct taggctcagc ctcccctggc 2400 
aggggaggag tcaagaaggt gtcaggggtt 2460 
gccttcggca cctcccctcc ccaacgcctc 2520 
accttccttc cctcaagggg ctcccctccc 2580 
caagaggatg aggagtgttt tctaaaatgc 2640 
cctgagactg gggtagcaac cccccctttt 2700 
tggaccagac tcagtggaca tttgtgcaat 2760 
taaaccagaa ataatttttt ttattattgt 2820 
taccaggtgt gtgtgtacat ataatatata 2880 
atatatctat cctgggatgg gaaaatgagg 2940 
tcttcccatt cctcagacca gcaggaaaag 3000 
ccctctcatt tgtcccagtt actaactacg 3 060 
tgattaggaa gaagcctggg gagaggcgag 3120 
ttggagagga gaattagttt tctaggctca 3180 
aaaacttcaa gacactggtg gtggtgggag 3240 
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atcaggaaaa taacttggcc tagctcaaac 
gggattagag tgtgctccta ctggcccctt 
gattttaaaa tccaaggcca ggagagaaga 
caaaaacggg ggatagagag aaggagtggc 
ggttacccct cagcccacct cactatggtg 
tctaaatagg ggagatccca gcctccacaa 
ccattttaaa ccaacgagga ataaaaagaa 



aatattggat aatcccctcc ttgggggaga 3300 
ggagcctccc ctagcttaca cagttaactt 33 60 
atccaaaaag caatattttt catcacatgc 3420 
aggcctaggc ccctccgatt gtcccttggg 3480 
ctgggtagag gggatacctg ggttctaacc 3540 
agaggccctt ttatttttta ttctgattag 3600 
atcctgatct aaaa 3644 



<210> 7 

<211> 3117 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_feature 

<223> Incyte ID No: 405313.4 

<220> 

<221> unsure 

<222> 64, 521, 534, 547 

<223> a, t, c, g, or other 



<400> 7 

gtccgcccgc ggtcccggcg gcgccaggtg 
cgcngccgcc gtagctgccc caggctcccc 
gtcgggagaa ggctcagaag ctgaacgagc 
gggaggagga caacaagtac tgcgccgact 
ggaatattgg tgtgtttatt tgcatcagat 
atatatccag ggtcaaatca gtcaacctag 
tgcaagatat gggaaatact aaagcaagac 
ttcgaagacc acagacagat caagcagtgg 
agaaatacta cgataaaaat gccatagcta 
aagaganaaa gagagaaaag gagccagaaa 
tgcagaagaa agatcagcaa ctggagccta 
agcccactgt ggatctttta ggacttgatg 
acacaacggt gccacccctg aacgatgatc 
ccttacctgc aactgtcatg cccccagctc 
ccctgtctac agtaacatct ggggatctag 
aagaagtggc aaagaaacaa ctttccaaag 
ccattcaaca gcaaagtact cctggtgtat 
cacaagcacc agctgcattt cagggctttc 
ctggccttat aggaaatgtg atgggacaga 
ccccaatggg tttatgggaa atgcacaaac 
tggcccccaa ggaggaatgg tgggacaaat 
gcaagctcag cagccccagt ggagcctctc 
tatcagtagt gcaaccccta ctgcaggttt 
gtctggaagc tcatcaggtc agactctcag 
caagtttcat ccagaactac cacctgacat 
ttattcatat gcatattttt tttcttttta 
tgaccgtgtt ggtctgtact gattcaattt 
tttatgtcaa gggcagcttt gctcatattt 
aagctgctca acttgcaaaa tcagttttcc 
atataaggga agtagttatc atgttagtaa 
attagccagt aatcctgtag gaaggtactg 
atgtaaatgt ctcactgagc actgttttct 
cacttcactg tgctgttgtt atgatgtgct 
taaacgtgga tgttactcca aaacttcgtt 
ctgcctctct tgtaatttgg atctcttctt 
ttctgcacta tatgcaaaca gggtaactaa 
cttgaaggta catctaggtt tatgacagta 
tgacaaaatg ttatttccct acattaaaca 



cgttcactct gcccggctcc agccagcgtc 60 
gccccgctgc cgagatggcg acgcgctcct 120 
agcaccagct catcctatcc aagcttctga 180 
gcgaggccaa aggtcctcga tgggcttcct 240 
gtgctggaat tcatagaaat cttggggttc 300 
accaatggac agcagaacag atacagtgca 360 
tactctatga agccaatctt ccagagaact 420 
aatttttcat cagagataaa tatgaaaaga 480 
ttacaaataa ngaaaaggaa aaanaaaagg 540 
agccggcaaa accacttaca gctgaaaagc 600 
aaaaaagtac cagccctaaa aaagctgtgg 660 
gccctgctgt ggcaccagtg accaacggga 720 
tggacatctt tggaccgatg atttctaatc 780 
aggggacacc ctctgcacca gcagctgcaa 840 
atttattcac tgagcaaact acaaaatcag 900 
actccatctt atctctgtat ggcacaggaa 960 
ttatgggacc cacaaatata ccatttacct 1020 
catcgatggg cgtgcctgtg cctgcagctc 1080 
gtccaagcat gatggtgggg catgcccatg 1140 
tggtgtgatg ccacttcctc agaacgttgt 1200 
gggtgcaccc cagagtaagt ttggcctgcc 1260 
acagatgaat cagcagatgg ctggcatgag 1320 
tggccagccc tccagcacaa cagcaggatg 13 80 
cacacaactg tggaaatgaa aactgcaata 1440 
tccttgctga aacgcatcta gttcccctgt 1500 
cccatttgtt catattaaga atgatctgat 1560 
gatgtggtga aaagcaggtt gataaatcat 1620 
cccatgattt catgtactgc attatttgag 1680 
tctcaataaa attatagctc taatgtttgc 1740 
tacctctaat agtataaacc ccaccccaaa 1800 
tatgatcaaa tgtttaatca tataaataga 1860 
agtgtatcaa aatgctctta tttcatcatt 1920 
taacagggaa cgtgattagt gaaaggaaga 1980 
taatgaatgc ttaaagaatt caaattttat 2040 
aatgtacata gtgctaacat gaagaccttt 2100 
ctaaaacaaa gccactttca atcttcaatc 2160 
attgtgttta cattttatgg tgcctagtat 2220 
tgactccata gaccttttca tatgtgggtt 22 80 
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tttatttcct 
caggaatcat 
gtctaccaac 
tatactcaag 
tgttgtagag 
caattagctt 
aggctgctgt 
catcctcttc 
tacctcaatt 
tgtccgagtg 
tcatggtgac 
gctttcggtt 
cctggctggg 
cttacagcaa 



atgatgtata 
caggaacgtt 
ctgttcaagt 
agtggtatct 
aaaacatgca 
atgttaactg 
actgaagtaa 
catatggatc 
ttcactgtgt 
tgccacacga 
actcgaggtc 
ccagttcttc 
atccacgacg 
atcctttgtg 



ctgccactaa 
tagctgacaa 
ctaccaatta 
tgcagtatcg 
gaacaaatga 
acaagctcca 
aacaaacaat 
cactggctgg 
ccaggtggta 
gaacctgaag 
gggcagcaca 
gactgttgtt 
cttaaataca 
aaaaataaaa 



ccttccaaaa 
aatacttgtc 
taagggcaaa 
gcactgtaca 
agacaaaaca 
tttaaacaga 
acctgaatgc 
acaaactgca 
ctttggctcg 
gggaaggaaa 
agtgtaatga 
atctgtttga 
gcttttggat 
aaaaaaaaag 



attacttagt 
tgttttaaaa 
ttggagaaaa 
aaaaaatctt 
tacattttgt 
tgtccatcag 
tctgtagcct 
ccagttgctg 
ttggctagat 
tagcttgggt 
ataccttagt 
gaaagtcaga 
tggacaaaat 
agactttaaa 



attgcaaagt 
acctgttcaa 
agaaaaaata 
ccaatttagt 
accaaccatc 
atgacaagaa 
aaactccaaa 
cttcaattta 
taaccttctc 
agcgcactct 
gcagttattt 
ttcttgcatc 
gacttgaaga 
aaaaaaa 



2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3117 



<210> 8 

<211> 2235 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_f eature 

<223> Incyte ID No: 436857.2 

<220> 

<221> unsure 
<222> 289-319 

<223> a, t, c, g, or other 



<400> 8 

ttcatcccgt 

aatacctccg 

ttccttccgg 

ggctttcatg 

cagaactcca 

nnnnnnnnnn 

gagaaagtct 

tgggtggcca 

agaatgatgg 

gacatgggtc 

gccgaactgg 

cagcctgctg 

gggaaacttt 

gctgtgagcg 

gaggggatgg 

cgattcttct 

aagccagcaa 

agagaccagg 

ctggttgctc 

tatgatgaag 

gacctagaag 

gagattctaa 

tttgatgagc 

cgtctagtcc 

gatgtgttct 

cacccgtgga 

acagtgtttg 

atgttccagg 

ggagaacatt 

tttgctgcct 

atctgatcca 



atctgcgcgt 
aagccgcttt 
gggacaacgt 
ggactccctc 
gcctaatgga 
nnnnnnnnng 
tccagtacat 
tcgagagcga 
ccgtggctgc 
ctcagcagct 
ggagcgatcc 
accggggcga 
atggacgagg 
ccttcagagc 
aagaggctgg 
ctggtgtgga 
tcacttacgg 
attttcactc 
ttctcggtag 
tggttcctct 
aataccggaa 
tgcacctctg 
ctggaactaa 
ctcacatgaa 
ccaaaagaaa 
ttgcaaatat 
gaacagaacc 
agatcgtcca 
cgcagaatga 
ttttcttaga 
ctgacagatt 



atgagatgca 
gttctccaga 
gggtcagggc 
tgccacattt 
tcccaaactc 
cggcacgttc 
tgacctccat 
ctctgtccag 
ggacacgctg 
gcccgatggt 
cacgaaaggc 
tgggtggctc 
agcgaccgac 
cctggagcaa 
ctctgttgcc 
ctacattgta 
aacccggggg 
aggaaccttt 
cctggtagac 
tacagaagag 
tagcagccgg 
gaggtaccca 
aacagtcata 
tgtgtctgcg 
tagttccaac 
tgatgacacc 
agatatgatc 
caagagcgtg 
gaaaatcaac 
gatggcccag 
cacctccccc 



ttgtctcttc 
tgtgaatagc 
acagagagat 
tttggaggtt 
gggagaatgg 
tcctcaccct 
caggatgaat 
cctgtgcctc 
cagcgcctgg 
cagagtcttc 
accgtgtgct 
acggacccct 
aacaaaggcc 
gatcttcctg 
ctggaggaac 
atttcagata 
aacagctact 
ggtggcatcc 
tcgtctggtc 
gaaataaata 
gttgagaaat 
tctctttcta 
cctggccgag 
gtggaaaaac 
aagatggttg 
cagtatctcg 
cgggatggat 
gtgctaattc 
aggtggaact 
ctccattaat 
acatccctag 



ctctgcagtt 
tccactatac 
atttaatgtc 
gggaaagttg 
ctgcgtccnn 
ccccgccccc 
ttgtgcagac 
gcttcagaca 
gggcccgtgt 
caatacctcc 
tctacggcca 
atgtgctgac 
ctgtcttggc 
tgaatatcaa 
ttgtggaaaa 
acctgtggat 
tcatggtgga 
ttcatgaacc 
atatcctggt 
catacaaagc 
ttctgttcga 
ttcatgggat 
ttataggaaa 
aggtgacacg 
tttccatgac 
cagcaaaaag 
ccaccattcc 
cgctgggagc 
acatagaggg 
cacaagaacc 
acagggatgg 



gagctgaatg 
cagcctcgtc 
accctcttgg 
ctagaggctt 
nnnnnnnnnn 
ggcgctgtta 
gctgaaggag 
agagctcttc 
ggcctcggtg 
cgtcatcctg 
cttggacgtg 
ggaggtagac 
ttggatcaat 
attcatcatt 
agaaaaggac 
cagccaaagg 
ggtgaaatgc 
aatggctgat 
ccctggaatc 
catccatcta 
tactaaggag 
cgagggcgcg 
attttcaatc 
acatcttgaa 
tctaggacta 
agcgatcaga 
aattgccaaa 
tgttgatgat 
aaccaaatta 
ttctagtctg 
aatgtaaata 
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tccagagaat ttgggtctag tatagtacat 

tctggatcag taataaaata tttcaaaggc 

actgcacacc ttcctcaagt catagctgct 

caatagcccc aggattggat tccttccaac 

attggcataa tcactccggt ttgctttcta 

tccatccaat gatcgccttt gctttaccac 
tggtctccac cactg 



tttcccttcc atttaaaatg tcttgggata 1920 

acagatgttg gaaatggttt aaggtccccc 1980 

tgcagcaact tgatttcccc aagtcctgtg 2040 

cttttagcat atctccaacc ttgcaatttg 2100 

ggtcctcaag tgctcgtgac acataatcat 2160 

tctttccttt tatcttatta ataaaaatgt 2220 

2235 



<210> 9 

<211> 542 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> raisc_f eature 

<223> Incyte ID No: 247285. l.j 



<400> 9 

cggggactga gggaagaagt gaaaatcgga 
ctcgccggct cctgagcggg ccaccgggcc 
ttatttggac tgagagctgg agaatgagaa 
gagaggtgtt tgagcccaga tgagtcatgg 
tggttctgga agaagattat gatgagacct 
ttgcccggga gattggtatt gatcccatca 
agggcatcgt ggccccactg cctggagagt 
tttactattt caacttcgcc aacgggcagt 
atcggagctt ggtgatccaa gagcgggcaa 
ag 



ctgccaggcg acagttcctc cgtttgaaat 60 
cgggctgggg gtctggcggg agaaataact 120 
taggacctga gagtatattg ggctaaggag 180 
ctggacgacc cctccgcata ggagatcagc 240 
acattcctag tgagcaagaa attcttgaat 300 
aggaaccaga actgatgtgg ctggcgcgag 360 
ggaaaccatg ccaggacatc acaggtgaca 420 
ctatgtggga ccatccatgt gacgaacact 480 
agctgtcaac ttctggggcc attaagaaga 540 

542 



<210> 10 

<211> 358 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_feature 

<223> Incyte ID No: 254510.1. j 



<400> 10 

cggacggcgt ggagtgactg 
ctgaggaaga aacccggaag 
agggtctatt gacattcagg 
tggaccctgc tcagaggact 
tctccctgga taccccttcc 
atacagaagt ggtccacaca 

<210> 11 

<211> 1481 

<212> DNA 

<213> Homo sapiens 



tcccaccgcc gcgggattga 
aggaagagga gagcaaagga 
gatgtggcca tagaattctc 
ctatacagag acgtgatgct 
aaatgcatga tgaagatgtt 
gggacattgc aaatacatgc 



cttctaaaga ctcttggtac 60 
gtcagggatg gctttttctc 120 
tcaggaggag tggaaatgcc 180 
ggagaattat aggaacctgg 240 
ctcatcaaca ggacaaggca 300 
aagtcatcac attggaga 3 58 



<220> 

<221> misc_feature 

<223> Incyte ID No: 284125. 2. j 



<400> 11 

gtgttgcgcg actggccttg agggagagct 
tcgatcgaaa tcgaatcttc ggatgtgatc 
agtttacatc gggcgttagc caccttgcag 
gacagcattg agagttttgt ggctgacatt 
gctatacagt ctctgaaatt gccagacaaa 



ggggcctgct cccggagaga tacggctatg 60 
cgccttatta tgcagtactt gaaggagaac 120 
gaggagacta ctgtgtctct gaatactgtg 180 
aacagtggcc attgggatac tgtgttgcag 240 
accctcattg acctctatga acaggttgtt 300 
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ctggaattga tagagctccg tgaattgggt gctgccaggt cacttttgag acagactgat 360 

cccatgatca tgttaaaaca aacacagcca gagcgatata ttcatctgga gaaccttttg 420 

gccaggtctt actttgatcc tcgtgaggca tacccagatg gaagtagcaa agaaaagaga 480 

agagcagcaa ttgcccaggc cttagctggc gaagtcagtg tggtgcctcc atctcgtctc 540 

atggcattgc tgggacaggc actgaagtgg cagcagcatc agggattgct tcctcctggt 600 

atgaccatag atttgtttcg aggcaaggca gctgtcaaag atgtggaaga agaaaagttt 660 

cctacacaac tgagcaggca tattaagttt ggtcagaaat cacatgtgga gtgtgctcga 720 

ttttctccag atggtcagta tttggtcact gggtctgttg atggattcat tgaagtatgg 780 

aactttacta ctggaaaaat cagaaaggat cttaagtacc aggcccaaga taactttatg 840 

atgatggatg atgctgtcct ctgcatgtgt ttcagcagag atacagaaat gttagcaact 900 

ggggcccaag atggaaaaat caaggtgtgg aagattcaga gtggacaatg tttaaggaga 960 

tttgagaggg cacacagtaa gggtgtcacc tgtctaagct tttctaagga tagcagtcag 1020 

atccttagtg cttcttttga ccagacaatt agaattcatg gtttaaaatc tgggaaaacc 1080 

ctgaaggaat ttcgtggcca ttcctccttt gttaacgaag caacatttac acaagatgga 1140 

cattacatta ttagtgcatc ctctgatggc actgtaaaga tctggaatat gaagaccaca 1200 

gaatgttcaa atacctttaa atccctgggc agcaccgcag ggaccagata ttaccgtcaa 1260 

cagtgtgatt ctacttccta aaaaccctga gcactttgtg gtgtgcaaca gatcaaacac 1320 

ggtggtcatc atgaacatgc aggggcagat tgtcagaagc ttcagttctg gtaaaagaga 13 80 

aggtggggac tttgtttgct gtgccctctc tccccgtggt gaatggatct actgtgtagg 1440 

ggaggacttt gtgctctact gtttcagtac agtcactggc a 1481 

<210> 12 
<211> 2439 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> mis cofeature 
<223> Incyte ID No: 331554.4. j 

<220> 

<221> unsure 

<222> 7, 19, 41, 624, 1062 
<223> a, t, c, g, or other 

<400> 12 

ccggatntta gtgtcagang cgcccccagc cgggcgggcg nctcagccat ggccctgcgc 
aaggaactgc tcaagtccat ctggtacgcc tttaccgcgc tggacgtgga gaagagtggc 
aaagtctcca agtcccagct caaggtgctg tcccacaacc tgtacacggt cctgcacatc 
ccccatgacc ccgtggccct ggaggaacac ttccgagatg atgatgacgg ccctgtgtcc 
agccagggat acatgcccta cctcaacaag tacatcctgg acaaggtgga ggagggggct 
tttgttaaag agcactttga tgagctgtgc tggacgctga cggccaagaa gaactatcgg 
gcagatagca acgggaacag tatgctctcc aatcaggatg ccttccgcct ctggtgcctc 
ttcaacttcc tgtctgagga caagtaccct ctgatcatgg ttcctgatga ggtggaatac 
ctgctgaaaa aggtactcag cagcatgagc ttggaggtga gcttgggtga gctggaggag 
cttctggccc aggaggccca ggtggcccag accaccgggg ggctcagcgt ctggcagttc 
ctggagctct tcaattcggg ccgntgcctg cggggcgtgg gccgggacaa cctcagcatg 
gccatccacg aggtctacca ggagctcatc caagatgtcc tgaagcaggg ctacctgtgg 
aagcgagggc acctgagaag gaactgggcc gaacgctggt tccagctgca gcccagctgc 
ctctggctac tttgggagtg aagagtgcaa agagaaaagg ggcattatcc cgctggatgc 
acactgctgc gtggaggtgc tgccagaccg cgacggaaag cgctgcatgt tctgtgtgaa 
gacagccacc cgcacgtatg agatgagcgc ctcagacacg cgccaggcca ggagtggaca 
gctgccatcc agatggcgat ccggctgcag gccgagggga agacgtccct acacaaggac 
ctgaagcaga aacggcgcga gcagcgggag cagcgggagc gncgccgggc ggccaaggaa 
gaggagctgc tgcggctgca gcactgcagg aggagaagga gcggaagtgc aggagctgga 
gctgctgcag gaggcgcacg gcaggccgag cggctgctgc aggaggagga ggaacggcgc 
cgcagccagc accgcgagct gcagcaggcg ctcgagggcc aactgcgcga ggcggagcag 
gcccgggcct ccatgcaggc tgagatggag ctgaaggagg aggaggctgc ccggcagcgg 
cagcgcattc aaggagctgg aggatatgca gcagcggttg caggaggccc tgcaactaga 
ggtgaaagct cggcgagatg aagaatctgt gcgaatcgct cagaccagac tgctggaaga 
ggaggaagag aagctgaagc agttgatgca gctgaaggag gagcaggagc gctacatcga 
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acgggcgcac aggagaagga agagctgcag caggagatgg cacagcagag ccgctccctg 
cagcaggccc agcagcagct ggaggaggtg cggcagaacc ggcagagggc tgacgaggat 
gtggaggctg cccagagaaa actgcgccag gccagcacca acgtgaaaca ctggaatgtc 
cagatgaacc ggctgatgca tccaattgag cctggagata agcgtccggt caccagcagc 
tccttctcag gcttccagcc ccctctgctt gcccaccgtg actcctccct aaagcgcctg 
acccgctggg gatcccaggg caacaggacc ccctgcgccc aacagcaatg agcagcagaa 
gtccctcaat ggtggggatg aggctcctgc cccggcttcc acccctcagg aagataaact 
ggatccagca ccagaaaatt agcctctctt agccccttgt tcttcccaat gtcatatcca 
ccaggacctg gccacagctg gcctgtgggt gatcccagct cttactagga gagggagctg 
aggtcctggt gccaggggcc caggccctcc aaccataaac agtccaggat ggaacctggt 
tcacccttca taccagctcc aagccccaga ccatgggagc tgtctgggat gttgatcctt 
gagaacttgg ccctgtgctt tagacccaag gacccgattc ctgggctagg aaagagagaa 
caagcaagcc ggggctacct gcccccaggt ggccaccaag ttgtggaagc acatttctaa 
ataaaaactg ctcttagaat gaattattgg ctcaggtctg tccatctctc ctgccatttc 
ctcccttcct ccctcaagcc ccgttatagg ttccaaagag cagtaaaagt ataataaagt 
ggttaagaaa gaccctgcag ctagactgcc tgggttctg 

<210> 13 
<211> 1307 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> misc__f eature 
<223> Incyte ID No: 331642. l.j 

<220> 

<221> unsure 
<222> 891 

<223> a, t, c, g, or other 
<400> 13 

ccggctcgct agccgtcctg cgggacgccg gcgctgatgg gttggggaaa tggacgcctg 60 
gagaacggaa atccagttat caaaattgac tcaagaagag agaacctaac agaacaataa 120 
caatggaaga aattgggaac attatcacaa agctaccatc ctgccaaact ccaggctcag 180 
atgtcacagg ttaaaaaaaa gtccttcatg aaaaagaaag atcttaagca gcatgatgga 240 
ttcagaagct catgaaaaga ggccaccaat actaacatct tcaaaacacg atatatcacc 300 
tcatattaca aatgttggtg agatgaagca ttacttgtgt ggctgctgcg ccgtatcgaa 3 60 
caacaccgca atcacatatc ccattcagaa ggtccccttt cgacaacagc tgtatggcat 420 
caaaacccgg gatgcaatac ttcagttgag aagggatgga tttcgaaatt tgtatcgtgg 480 
aatccttccc ccattgatgc agaagacaac tacgcccgca cttatgtttg gtctgtatga 540 
ggatttatcc tgccttctcc acaagcatgt cagtgctcca gagtttgcaa ccagtggcgt 600 
ggcggcagtg cttgcaggga caacagaagc aattttcact ccactggaaa gagttcagac 660 
attgcttcaa gaccacaagc atcatgacaa atttaccaac acttaccagg ctttcaaggc 720 
actgaaatgt catggaattg gagagtatta tcgaggttgg tgcccattct tttccggaat 780 
ggactcagca atgtcttgtt tttcggcttc gaggtcccat taaggagcat ctgcctaccg 840 
caacgactca cagtgctcat ctggtcaatg attttatctg tggaggtcta ntgggtgcca 900 
tgttgggatt cttgtttttt ccaattaatg ttgtaaaaac tcgcatacag tctcagattg 960 
gtggggaatt tcagtctttc cccaaggttt tccaaaaaat ctggctggaa cgggacagaa 1020 
aactgataaa tcttttcaga ggtgcccatc tgaattacca tcggtccctc atctcttggg 1080 
gcataatcaa tgcaacttat gagttcttgt taaaggttat atgaaaaaac catcagttaa 1140 
gtgccattta tcaactgaat agaccttcta agaagaatgc agtttggcct ctttcttagt 1200 
tggccaaata caagttggtg tcataactcc aggccacagt gagttatggg caaagctgtt 1260 
ttgcttaagc ctcaataaaa cagaataaaa gattccaata ggaaaat 13 07 

<210> 14 

<211> 303 

<212> DNA 

<213> Homo sapiens 

<220> 
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<221> misc_feature 

<223> Incyte ID No: 445594. 2. j 

<220> 

<221> unsure 
<222> 184 

<223> a, t, c, g, or other 



<400> 14 

gcgctctcgg cccacacaat atgacctcgg 
tccacttctt cttaatgaat gactgactta 
gaaagaagag gagggaatgg ctctttctca 
aganttctct caagaggagt gggagtgcct 
cgtgatgttg gagaactaca ggaacctgct 
aga 



ggaggatgcg aggaagatga actgtgatga 60 
cctgagaaag aaactcagag gaagaggaaa 120 
gggactgttt acattcaagg atgtggccat 180 
ggaccctgcc cagagggcct tgtacaggga 240 
ttctctcgat gaggataaca tccctccaga 3 00 

303 
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